Languages and the Machine
Chapter 5
CS221
Topics
• The Compilation Process• The Assembly Process• Linking and Loading• Macros• We will skip
– Case Study: Extensions to the Instruction Set – The Intel MMX™ and Motorola AltiVec™ SIMD Instructions
Compilation Process
• Assembly to Machine code fairly straightforward, but compilation is not• Translate a program written in a high level language into a functionally
equivalent program in assembly language• Consider a simple high-level language assignment statement:
– Foo = Bar + Zot + 15;• • Steps involved in compiling this statement into assembly code:
– Lexical analysis: separate into tokens, Foo, =, +, etc.– Syntactic Analysis / Parsing : Determine that we are performing an assignment,
VAR = EXPRESSION– Semantic Analysis : Determine that Foo, Bar, Zot are names, 4 is an integer– Code Generation : Determine the proper assembly code to perform the action
• ld [Bar], %r0, %r1• ld [Zot], %r0, %r2• addcc %r1, %r2, %r1• addcc %r1, 15, %r2• st %r2, %r0, [Foo]
Compiler Issues
• Each compiler specific to a particular ISA– E.g., an int on one machine may be 32 bits, on another may be 64
bits • Cause of error in networking library ported to Alpha• Int issue not a problem in Java; JVM specifies 32 bits
– E.g., in previous example, if the ISA allowed operands of addcc to be memory addresses, we could have done
• addcc [Bar], [Zot], %r1• addcc %r1, 15, [Foo]
– Hopefully the compiler generates efficient code but optimization is a tough issue!
• Cross compiler: one that generates code for a different ISA (example, CodeWarrior)
Mapping Variables to Memory
• Global variables– Accessible from anywhere in the program, given a fixed address– E.g., global variable X at memory address 400
• Local variables– Also called automatic variables– Defined inside a function or method, e.g.
void foo(){
int a,b;…
}– These variables created when foo is invoked, destroyed when foo
exits– These variables are created by pushing them on the stack when the
function is invoked, and are popped off when the function exits
Local Variables and the Stack• Recall that the stack typically grows downward in
memory
• Here we start with 1234 stored on the top of the stack
Mem
048…
SP = 8
Push FFFF
Mem
048…
SP = 4
FFFF12341234
Local Variables and the Stack
• In our case, local variables are “pushed” on the stack upon entering the function– void foo() { int a; }
• Copy SP into Frame Pointer FP (also called the Base Pointer, or BP)
Mem before Foo
048…
SP = 8
Mem in Foo
048…
SP = 4
Var a12341234
FP = 8
Accessing Stack Variables
• These variables are referenced as offsets from the frame pointer, called based addressing
• To access a: [%fp – 4] Mem in Foo
048…
SP = 4
Var a1234
FP = 8
Why not use [%sp] ?Consider pushing lotsof stuff on the stack…Or data structures
C to ASM Example on x86
#include <stdio.h>int c;int main(){
int a,b; a=3;
b=4;c=a*b;
}
pushl %ebpmovl %esp, %ebpsubl $8, %esp
movl $3, -4(%ebp)movl $4, -8(%ebp)
movl -4(%ebp),%eaximul1 -8(%ebp),%eaxmovl %eax, c….comm c,4,4
Arrays in Memory• Arrays may be allocated on the stack or allocated off the
heap, a pool of memory where portions may be dynamically allocated. Access elements of an array a bit different than regular variables.
• int A[10]; Array of 10 integers
Mem allocated for A
048…40
A (Base) = 4 A[0]A[1]…A[9]
ElementAddr = A + (Index*Size)e.g. A[2] is at 4 + (2*4) = 12
If-Statements
• Conditional statements map to a comparison and a branch instruction
• C– if (x==y) statement1; else statement2;
• Assembly (assume X in r1, Y in r2)– subcc %r1, %r2 ! Zero flag set if res=0– bne Statement2 ! Branch if zero flag is not set– ! Statement1 code– ba StatementNext ! Branch always
• Statement2: ! Statement2 code• StatementNext:
Loops
• While, Do-While, For loops implemented using the same conditional check and branch as the if-then statement– The branch returns back to previous code
instead of jumping forward over code
Production Level Assemblers
• Allow programmer to specify location of data and code• Provide mnemonics for all instructions and addressing
modes• Permit the use of symbolic labels to represent addresses
and constants• Provide a means to specify the starting address of the
program• Include a way to share variables between different
assembled programs• Support macros
Assembly Example
Assembled Code
Two Pass Assemblers
• Most assemblers are “two-pass”– First pass
• Determine addresses of all data and instructions• Perform any assembly-time arithmetic• Put definitions and constants into the symbol table
– Second pass• Generate machine code• Insert actual addresses and values of symbols which are
known from the symbol table
– Two passes useful for forward references, i.e. referencing later on in the program
Forward Reference
Symbol Table
• Generated during the first pass• Maps identifiers to values, table filled in as values
are encountered and the program is parsed from top to bottom
• .org 2048 ; Says assemble code starting at 2048• const .equ value ; Defines const equal to value
Assembled Program
Final Tasks of the Assembler
• Linking and Loading• We need the following additional info
– Module name and size– Address start symbol– Information about global and external symbols– Information about any library routines– Values of constants– Relocation information
Location of Programs in Memory
• We have been using .org to specify a fixed start location• Typically we will want programs capable of running in
arbitrary locations– If we are concatenating together different modules, the addresses
for identifiers in the different modules must be relocated
• Linker : software that combines separately assembled modules
• Loader : software that loads another program into memory and may modify addresses if the program is loaded in a location different from the origin– Must also set appropriate registers, e.g. %SP
Linking: .global and .extern• A .global is used in the module that a symbols is defined and .extern is used
in every other module that refers to it
Linking and Loading
• Symbol tables for previous example• Symbols whose address might change market
relocatable (not all addresses! Some may be fixed)
DLL’s
• Windows uses Dynamic Link Libraries, or DLL’s• Linking a common routine in many programs results in
duplicate code from that common routine in each program• In a DLL, commonly used routines (e.g. memory
management, graphics) present in only one place, the DLL– Smaller program sizes, each program does not need to have its own
copy– All programs share the exact same code while executing– Don’t need recompiling or relinking
• Disadvantages– Deletion of a shared DLL by mistake can cause problems– Versions must be the same– DLL code file can live in many places in Windows– “DLL Hell”
Macros• An assembly macro looks kind of like defining a
subroutine• For example, there say that there is no PUSH
instruction to push data on the stack. We can make a macro for push:
Macro Expansion
• Given the previous macro, we could now write the following code:
push %r15 ! Push r15 on the stackpush %r20 ! Push r20 on the stack
• Upon assembly, these macros are expanded to generate the following actual code:
addcc %r14, -4, %r14 st %r15, %r14 addcc %r14, -4, %r14 st %r20, %r14
Macros vs. Subroutines
• Later we will see how to write actual subroutines we can call– Only one copy of the shared code in a subroutine
• Tradeoffs– Subroutines
• Takes up less memory since only one copy of the code• But slower than macros; subroutines have overhead of
invoking and returning– Macros
• Take up more space than subroutine call due to macro expansion for each occurrence of the macro
• Faster than subroutines; no overhead to invoke/return
Skipping for now
• Discussion on Pentium MMX
• We may return to this later if time permits