Upload
vuongque
View
215
Download
1
Embed Size (px)
Citation preview
ECE 154AECE 154A Introduction to Computer
ArchitectureFall 2013Fall 2013
D it i St kDmitri Strukov
Software InterfaceSoftware Interface
AgendaAgenda
• Procedures and stackProcedures and stack
• Memory mapping
li k d li• Arrays vs. linked lists
• Memory management
• Program compilation, linking, loading and execution
Big IdeaBig Idea
• Architecture should be convenient forArchitecture should be convenient for programmers– HW support for programming language– HW support for programming language constructions
– Debugging security etcDebugging, security etc.
Why Subroutines (Procedures) I t t?Important?
• Better structure– Fewer bugs, i.e. faster and cheaper development
• More compact code– Fewer bugs– Very important when memory is limited, e.g. early daysy
– Even for today’s computers will typically lead in better performance …
• Fewer misses (memory hierarchy)Fewer misses (memory hierarchy)– … but could have also negative effects if overhead (i.e. control instructions) is significant
Implementing SubroutinesImplementing Subroutines
• Can implement with existing instructions
– What if procedure is written by j proc
p ysomebody else and already compiled (e.g. library)
cont: xxx
…..– Still doable to patch binaries
• Procedures are very frequent
proc: xxx
j conty qso let’s have special instructions to support it JAL and JR
j cont
Instructions for Accessing Procedures• MIPS procedure call instruction:
jal ProcedureAddress #jump and linkjal ProcedureAddress #jump and link
• Saves PC+4 in register $ra to have a link to the next instruction for the procedure return
• Machine format (J format):
0x03 26 bit address
• Then can do procedure return with ajr $ra #returnjr $ra #return
• Instruction format (R format):
0 0x08
Illustrating a Procedure Call
Prepare main
jal proc
proc S t
PC Prepare
to continue
to call
Save, etc.
jr $ra Restore
Relationship between the main program and a procedure.
More Issues with ProceduresMore Issues with Procedures
• Q1: How to pass to and return from a procedure the data?
• Would like to use as many as possible register inside procedure (callee) to better utilize temporal locality but some register may be utilized by callerutilized by caller
Solution: Spill registers (move RF content to main memory and then restore) What is the exact mechanism for that in particularthen restore). What is the exact mechanism for that, in particular
• Q2: Which registers to spill? • Q3: Who is responsible saving (callee vs. caller) ? • Q4: Where to spill?• Q4: Where to spill?
Solution: There are certain rules enforced in a software which helps such implementationhelps such implementation
$0 0 $zero A 4-byte word
Typical Use of Registers
Saved Procedure arguments
Procedure results
Reserved for assembler use $1 $2 $3 $4 $5 $6
$at $v0
$a0
$a2
$v1
$a1
32 1 0
A 4 byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the
Answer to Q1
Temporary l
arguments $6 $7 $8 $9 $10 $11
$t0
$t2
$t1
$t3
$a2 $a3
When loading a byte into a
Byte numbering: 0 1 2 3
ylowest address)
In principle, one can use registers as values $12
$13 $14 $15 $16 $17
$t4
$t6
$t5
$t7
$s0
$s1
yregister, it goes in the low end Byte
Word
Doublew ord
p p , ghe/she likes without sticking to these guidelines
(one exception: In MIPS kernel registers might be
Operands
Saved across
procedure calls
$17 $18 $19 $20 $21 $22
$s2
$s4
$s6
$s1
$s3
$s5
rewritten by hardware on special occasions (exceptions) so it is better not to use them )
However, if the program is supposed b h i h h (
More temporaries
Global pointer
Reserved for OS (kernel)
$23 $24 $25 $26 $27 $28
$s7
$t8
$t9
$gp
$k0
$k1
A doubleword sits in consecutive registers or memory locations according to the big-endian order
to be run together with others (e.g. under certain OS and/or if it uses subroutines written by other people) then it is a good idea to stick to these Global pointer
Stack pointer Frame pointer Return address
Saved
$28 $29 $30 $31
$gp
$sp
$fp
$ra
(most significantword comes first)
grules
A Simple MIPS Procedure
Procedure to find the absolute value of an integer.
$v0 |($a0)|
Solution
The absolute value of x is –x if x < 0 and x otherwise.
abs: sub $v0,$zero,$a0 # put -($a0) in $v0; # in case ($a0) < 0
bltz $a0,done # if ($a0)<0 then done add $v0,$a0,$zero # else put ($a0) in $v0
done: jr $ra # return to calling program
In practice, we seldom use such short procedures because of the p , poverhead that they entail. In this example, we have 3-4 instructions of overhead for 3 instructions of useful computation.
No register spilling here -- see next example
$0 0 $zero A 4-byte word
Typical Use of Registers
Saved Procedure arguments
Procedure results
Reserved for assembler use $1 $2 $3 $4 $5 $6
$at $v0
$a0
$a2
$v1
$a1
32 1 0
A 4 byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the
Answer to Q2
Temporary l
arguments $6 $7 $8 $9 $10 $11
$t0
$t2
$t1
$t3
$a2 $a3
When loading a byte into a
Byte numbering: 0 1 2 3
ylowest address)
In principle, one can use registers as values $12
$13 $14 $15 $16 $17
$t4
$t6
$t5
$t7
$s0
$s1
yregister, it goes in the low end Byte
Word
Doublew ord
p p , ghe/she likes without sticking to these guidelines
(one exception: In MIPS kernel registers might be
Operands
Saved across
procedure calls
$17 $18 $19 $20 $21 $22
$s2
$s4
$s6
$s1
$s3
$s5
rewritten by hardware on special occasions (exceptions) so it is better not to use them )
However, if the program is supposed b h i h h (
More temporaries
Global pointer
Reserved for OS (kernel)
$23 $24 $25 $26 $27 $28
$s7
$t8
$t9
$gp
$k0
$k1
A doubleword sits in consecutive registers or memory locations according to the big-endian order
to be run together with others (e.g. under certain OS and/or if it uses subroutines written by other people) then it is a good idea to stick to these Global pointer
Stack pointer Frame pointer Return address
Saved
$28 $29 $30 $31
$gp
$sp
$fp
$ra
(most significantword comes first)
grules
Six Steps in Execution of a Procedure(Answer to Q3)(Answer to Q3)
1. Main routine (caller) places parameters in a place where the procedure (callee) can access themprocedure (callee) can access them– $a0 ‐ $a3: four argument registers
2. Caller transfers control to the callee2. Caller transfers control to the callee
3. Callee acquires the storage resources needed
4. Callee performs the desired task
5. Callee places the result value in a place where the caller can access it
$ 0 $ 1 t l i t f lt l– $v0 ‐ $v1: two value registers for result values
6. Callee returns control to the caller– $ra: one return address register to return to the point of$ra: one return address register to return to the point of
origin
Illustrating a Procedure Call
Prepare main
jal proc
proc S t
PC Prepare
to continue
to call
Save, etc.
jr $ra Restore
Relationship between the main program and a procedure.
Spilling Registers (Answer to Q4)• What if the callee needs to use more registers than allocated to argument and return values?
callee uses a stack a last in first out queue– callee uses a stack – a last‐in‐first‐out queue
high addr One of the general registers, $sp ($29),
is used to address the stack (which
$sp
is used to address the stack (which “grows” from high address to low address)
dd d t t th t k htop of stack
add data onto the stack – push
$sp = $sp – 4data on stack at new $sp
remove data from the stack – pop
data from stack at $sp $ $ 4low addr $sp = $sp + 4
Allocating Space on the Stack• The segment of the stack containing a procedure’s saved registers and local
high addr
saved registers and local variables is its procedure frame (aka activation
d)
Saved argument regs (if any)
Saved return addr
$fp
record)– The frame pointer ($fp) points to the first word of
Saved return addr
Saved local regs (if any)
Local arrays & pthe frame of a procedure –providing a stable “base” register for the procedure
$sp
ystructures (if any)
• $fp is initialized using $spon a call and $sp is restored using $fp on a return
low addr
Example: Parameters and Results
$sp zlow addr
$sp
Frame for current procedure Saved
y z
. . . Local variables
b$sp c
Frame for bc
Frame for$fp
Old ($fp)
Savedregisters
b a
Frame forcurrent procedure
$fp
. . .
ba
Frame forprevious procedure . . .
Use of the stack by a procedure. Before calling After calling
high addr
y p
More on ProceduresMore on Procedures
Prolog
‐ spill all register to stack used by procedure expect for $t0‐$t9
and the one used for returning values
d k i ($ ) fi h i k‐ advance stack pointer ($sp) first then write to stack
Body
code of the procedurecode of the procedure
Epilog
‐ restore all used registers
‐ adjust stack pointer at the end ($sp)
Example of Using the Stack
Saving $fp, $ra, and $s0 onto the stack and restoring them at the end of the procedure
proc: sw $fp,-4($sp) # save the old frame pointeraddi $fp $sp 0 # save ($sp) into $fpaddi $fp,$sp,0 # save ($sp) into $fpaddi $sp,$sp,–12 # create 3 spaces on top of stacksw $ra,-8($fp) # save ($ra) in 2nd stack elementsw $s0,-12($fp) # save ($s0) in top stack element
$ ...lw $s0,-12($fp) # put top stack element in $s0lw $ra -8($fp) # put 2nd stack element in $ra
$sp($fp)
$fp
$sp($ra)($s0)
lw $ra, 8($fp) # put 2nd stack element in $raaddi $sp,$fp, 0 # restore $sp to original statelw $fp,-4($sp) # restore $fp to original statejr $ra # return from procedure
$fp
p
Could be a good idea to modify the stack pointer first in epilog (before writing to stack) and last in prolog. Why?
$0 0 $zero A 4-byte word
Typical Use of Registers
Saved Procedure arguments
Procedure results
Reserved for assembler use $1 $2 $3 $4 $5 $6
$at $v0
$a0
$a2
$v1
$a1
32 1 0
A 4 byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the
Temporary l
arguments $6 $7 $8 $9 $10 $11
$t0
$t2
$t1
$t3
$a2 $a3
When loading a byte into a
Byte numbering: 0 1 2 3
ylowest address)
In principle, one can use registers as values $12
$13 $14 $15 $16 $17
$t4
$t6
$t5
$t7
$s0
$s1
yregister, it goes in the low end Byte
Word
Doublew ord
p p , ghe/she likes without sticking to these guidelines
(one exception: In MIPS kernel registers might be
Operands
Saved across
procedure calls
$17 $18 $19 $20 $21 $22
$s2
$s4
$s6
$s1
$s3
$s5
rewritten by hardware on special occasions (exceptions) so it is better not to use them )
However, if the program is supposed b h i h h (
More temporaries
Global pointer
Reserved for OS (kernel)
$23 $24 $25 $26 $27 $28
$s7
$t8
$t9
$gp
$k0
$k1
A doubleword sits in consecutive registers or memory locations according to the big-endian order
to be run together with others (e.g. under certain OS and/or if it uses subroutines written by other people) then it is a good idea to stick to these Global pointer
Stack pointer Frame pointer Return address
Saved
$28 $29 $30 $31
$gp
$sp
$fp
$ra
(most significantword comes first)
grules
Nested Procedure Calls
Prepare
main
jal abc
abc Save
PC Prepare to continue
Prepareto call
Procedure abc
Procedure xyz
jal xyz
xyz xyz
jr $ra Restore
jr $ra
Example of nested procedure calls.
Fibonacci numbers ( l bl )(Similar problem in HW4)
( ) ( 1) ( 2)F(n) = F(n‐1)+F(n‐2)
F(1) = 1
F(2) = 1F(2) 1
n = 1 2 3 4 5 6 …
F(n) = 1 1 2 3 5 8 …
/* Recursive function in c */
int fib(int n) {int fib(int n) {
If (n==1 || n==2) return 1;
return fib(n‐1)+fib(n‐2);
}
Big Picture• More complicated picture for modern processors. Many details are
missing• Complication #1: IM and DM are caches: Fast but small memory• Complication #2: Program are mapped to virtual address space:
the mapping for the program and data in question should be aware of other programs and data (i.e. O/S) each program (process) is mapped to its own virtual address spacemapped to its own virtual address space
• Additional mechanism (implemented in SW and HW) are taking care of that (will be discussed later)
Main memory
HW + SW
Virtual memory
HW +
ReadAddress
Instruction
InstructionMemory
Add
PC
4
Read Addr 1
Read Addr 2
Write Addr
Register
File
ReadData 1
Read2
ALU
DataMemory
Address
Write Data
Read Data
SW
Write DataData 2 Write Data
SignExtend16 32
Big Picture
• Assume that there is only one program mapped to physical memoryto physical memory
• Questions to answer Wh t t d d h t t d t ?– Where to store code and where to store date?
– Would stack structure be enough to keep all the data? • What kind of data are typically present?• What kind of data are typically present?
• A related question: How to pass more than oneA related question: How to pass more than one parameter to procedure?
Address space (language and OS specific)
• A program’s address space contains 4 regions:– stack: local variables, grows
stack~ FFFF FFFFhex
stack: local variables, grows downward
– dynamic data (heap): space requested for pointers viarequested for pointers via malloc() ; resizes dynamically, grows upward
– static data: variables declareddynamic data
static data: variables declared outside main, does not grow or shrink
– code: loaded when program code
static datacode: loaded when program starts, does not change
For now, OS somehow
code~ 0hex
• Why stack grows from top to bottom? prevents accesses between stack and heap (gray hash lines). Wait for virtual memory
Memory Map inReserved 1 M words
Hex address
00000000
00400000 Memory Map in MIPS Program
10000000
Text segment 63 M words
Static dataAddressable10008000
1000ffff
Data segment
Static data
Dynamic data
Addressable with 16-bit signed offset
$gp
$sp
448 M words $28
$29
$30
Stack
7ffffffc
Stack segment
$fp
$30
80000000
Overview of the memory address space in MiniMIPS
Second half of address space reserved for memory-mapped I/O
Overview of the memory address space in MiniMIPS.
Pointers (1/4)
• Sometimes you want to have a procedure increment a variable?
$sp
p• What gets printed?
y
$fpframe pointer for main
void main() {int y = 5;
o a
y = 5jal AddOne…
lw $a0, ‐12($fp)…AddOne( y);printf(“y = %d\n”, y);
AddOne: addi $t0, $a0, 1jr $ra
}
void AddOne(int x)
$a0
( ){ x = x + 1; }
Pointers (2/4)
• Solved by passing in a pointerto our subroutine.
$sp
• Now what gets printed?y
$fp
addi $a0, $fp, ‐12
void main() {int y = 5;…
jal AddOne…AddOne: lw $t0, 0($a0)
addi $t0, $t0,1
…AddOne(&y);printf(“y = %d\n”, y);}
y = 6addi $t0, $t0,1sw $t0, 0($a0)jr $ra
}
void AddOne(int *p)
$a0
{ *p = *p + 1; }
Pointers (2.5/4) another way of correcting itcorrecting it
• Sometimes you want to have a procedure increment a variable?
$sp
procedure increment a variable?• What gets printed? y
void main() {int y = 5;
$fp
int y = 5;…y = AddOne( y);
jal AddOnesw $v0,‐12($fp)
lw $a0, ‐12($fp)
printf(“y = %d\n”, y);}int AddOne(int x)
y = 6$a0
…AddOne: addi $v0, $a0, 1
jr $ra( )
{ x = x + 1; return x;}
Pointers (3/4)B t h t if h t t
$sp
• But what if what you want changed is a pointer?
• What gets printed?A[2]A[1]A[0]q
• What gets printed? $fp
l $ 0 20($f )void main() {int A[3] = {50, 60, 70};int *q = A;
jal IncPtr…IncPtr: addi $t0, $a0, 1
lw $a0, ‐20($fp)
int q A;…IncrementPtr( q);i tf(“* %d\ ” * ) * 50 q
jr $ra
printf(“*q = %d\n”, *q);}
*q = 50 A q
$a0
void IncrementPtr(int *p){ p = p + 1; } 50 60 70
Pointers (4/4) $sp
A[0]q
• Solution! Pass a pointer to a pointer, declared as **h
A[2]A[1]
$fp
A[0]
p ,• Now what gets printed?
addi $a0, $fp, ‐20 jal IncPtr
$fp
void main() {int A[3] = {50, 60, 70};
jal IncPtr…IncPtr: lw $t0, 0($a0)
addi $t0, $t0,4$ ($ )int A[3] {50, 60, 70};
int *q = A;IncrementPtr(&q);i tf(“* %d\ ” * ) * 60
sw $t0, 0($a0)jr $ra
printf(“*q = %d\n”, *q);}
*q = 60 A q q$a0
Note +4!
void IncrementPtr(int **h){ *h = *h + 1; } 50 60 70
Arrays examplevoid foo() {
int *p, *q, x;int a[4];p = (int *) malloc (sizeof(int));q = &x;q = &x;
*p = 1; // p[0] would also work here printf("*p:%u, p:%u, &p:%u\n", *p, p, &p);*q = 2; // q[0] would also work here printf("*q:%u, q:%u, &q:%u\n", *q, q, &q);*a = 3; // a[0] would also work here
i f( * % % % \ * )printf("*a:%u, a:%u, &a:%u\n", *a, a, &a);
? ? ......0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 ...
? ? ?40 20 2 3 1
} p q x unnamed‐malloc‐space
*p:1, p:40, &p:12*q:2, q:20, &q:1624
?
q:2, q:20, &q:16*a:3, a:24, &a:24
“An array name is not a variable”
a
Example of array MIPS pseudocode.dataa: .word 100
.text
Example of array C code
int a[100];void main () {int b[10];int size;
.textaddi $sp, $sp, ‐10*4 – 8 ‐ #reg to spill * 4; addi $fp, $sp, 10*4 + 8 + #reg to spill * 4;add $t0, $fp, ‐10*4 (address of base of array b)****
main:
int *p;****p = (int *)malloc(sizeof(int)*size);****
add $a0, $0, $t1 ($t1 has value of size*4)jal malloc (malloc returning
memory address to $v0)****
free(p);****}
sw $v0, ‐44($fp) (modify *p)
add $a0, $v0, $0jal freej****addi $sp, $sp, +10*4 + 8 + #reg to spill * 4;jr $ra
malloc andmalloc and free are an OS procedures
C structures• A struct is a data structure composed from• A struct is a data structure composed from simpler data types.– Like a class in Java/C++ but without methods or i h iinheritance.
struct point { /* type definition */int x;i tint y;
};
void PrintPoint(struct point p)( p p){
printf(“(%d,%d)”, p.x, p.y);} As always in C, the argument is passed by “value” – a copy is made.
struct point p1 = {0,10}; /* x=0, y=10 */
PrintPoint(p1);PrintPoint(p1);
C structures: Pointers to them
• Usually, more efficient to pass a pointer to the structstruct.
• The C arrow operator (->) dereferences and extracts a structure field with a single operatorextracts a structure field with a single operator.
• The following are equivalent:
struct point *p;/* code to assign to pointer */g pprintf(“x is %d\n”, (*p).x);printf(“x is %d\n”, p->x);
How big are structs?
• Recall C operator sizeof() which gives size in bytes (of type or variable)size in bytes (of type or variable)
• How big is sizeof(p)? struct p {struct p {
char x;int y;y;
};– 5 bytes? 8 bytes? – Compiler may word align integer y
Array vs. Linked listy‐ Quickly changing size, order‐ More often dynamically
‐ Slowly changing size, order ‐ Could be allocated y y
(rarely statically)‐ Could be contiguous (when static) but most
Could be allocated dynamically or statically‐ Contiguous location in
static) but most often not
‐ Slower traversal /
memory ‐ Fast traversal / no memory overhead but fixed structure
additional memory for storing pointers but flexible structure
Example of linked list
Linked list example
d i
static
Struct mylist {int value;
l *
Example of linked list C code
dynamic
struct mylist *next;struct mylist *prev;
}
i i l d hiExample of linked list
In principle can do this(can be allocated in any type of memory):
struct mylist *list[100]; ****addi $a0, $0, 12
pMIPS pseudocode
main:
Most typically:
void main(){struct mylist *p *cur;
jal malloc (malloc returning memory address to $v0)
add $a0, $0, $t1 ($t1 has address cur)add $a1, $0, $v0
struct mylist p, cur;*****p = malloc(sizeof(struct mylist)*1);add(cur, p);*****
jal addelement****jal deleteadd $a0, $0, $t1
delete(cur);*****
}
jal free****jr $ra
stack
Memory Management
• How do we manage memory?g y• Code, Static storage are easy: they never grow or shrinky g
• Stack space is also easy: stack frames are created and destroyed in ylast‐in, first‐out (LIFO) order
• Managing the heap is tricky:memory can be allocated / deallocated at any time
Heap Management Requirements
• Want malloc() and free() to run quickly.quickly.
• Want minimal memory overhead• Want to avoid fragmentation* –• Want to avoid fragmentationwhen most of our free memory is in many small chunks– In this case, we might have many free bytes but not be able to satisfy a large request since the free bytes are not contiguous in memory.
* This is technically called external fragmention
Heap Management
• An exampleRequest R1 for 100– Request R1 for 100 bytes
– Request R2 for 1 byte
R1 (100 bytes)
– Request R2 for 1 byte
– Memory from R1 is freed
R2 (1 byte)
freed
– Request R3 for 50 bytes
Heap Management
• An exampleRequest R1 for 100
R3?– Request R1 for 100 bytes
– Request R2 for 1 byte– Request R2 for 1 byte
– Memory from R1 is freed
R2 (1 byte)
freed
– Request R3 for 50 bytesR3?
Example (K&R) Malloc/Free ImplementationImplementation
• Each block of memory is preceded by a header that has two fields: size of the block and a pointer to the next block
• All free blocks are kept in a circular linked plist, the pointer field is unused in an allocated block
Example Implementation
• malloc() searches the free list for a block that is big enough If none is found more memory isis big enough. If none is found, more memory is requested from the operating system. If what it gets can’t satisfy the request, it fails.g y q ,
• free() checks if the blocks adjacent to the freed block are also freefreed block are also free– If so, adjacent free blocks are merged (coalesced) into a single, larger free block
– Otherwise, the freed block is just added to the free list
Choosing a block in malloc()
• If there are multiple free blocks of memory that are big enough for some request, howthat are big enough for some request, how do we choose which one to use?– best‐fit: choose the smallest block that is big genough for the request
– first‐fit: choose the first block we see that is big henough
– next‐fit: like first‐fit but remember where we finished searching and resume searching fromfinished searching and resume searching from there
Tradeoffs of allocation policies
• Best‐fit: Tries to limit fragmentation but at the cost of time (must examine all free blocks forcost of time (must examine all free blocks for each malloc). Leaves lots of small blocks (why?)
• First‐fit: Quicker than best‐fit (why?) butFirst fit: Quicker than best fit (why?) but potentially more fragmentation. Tends to concentrate small blocks at the beginning ofconcentrate small blocks at the beginning of the free list (why?)
• Next‐fit: Does not concentrate small blocks atNext fit: Does not concentrate small blocks at front like first‐fit, should be faster as a result.
The C Code Translation HierarchyC program
compiler
assembly code
assembler
object code library routines
linker
executable
loader
machine code
memory
Compiler Benefits• Comparing performance for bubble (exchange) sort
– To sort 100,000 words with the array initialized to random values on a Pentium 4 with a 3 06 clock rate a 533 MHz systemvalues on a Pentium 4 with a 3.06 clock rate, a 533 MHz system bus, with 2 GB of DDR SDRAM, using Linux version 2.4.20
gcc opt Relative Clock cycles Instr count CPIgcc opt Relative performance
Clock cycles (M)
Instr count(M)
CPI
None 1.00 158,615 114,938 1.38
O1 (medium) 2 37 66 990 37 470 1 79O1 (medium) 2.37 66,990 37,470 1.79
O2 (full) 2.38 66,521 39,993 1.66
O3 (proc mig) 2.41 65,747 44,993 1.46
The unoptimized code has the best CPI, the O1 version has the lowest instruction count, but the O3 version is the fastest. Wh ?Why?
Assembler
• Input: Assembly Language Code(e g foo s for MIPS)(e.g., foo.s for MIPS)
• Output: Object Code, information tables(e g foo o for MIPS)(e.g., foo.o for MIPS)
• Reads and Uses Directives
• Replace Pseudoinstructions
• Produce Machine Languageg g
• Creates Object File
Assembler Directives• Give directions to assembler, but do not produce machine instructionsproduce machine instructions.text: Subsequent items put in user text segment (machine code).data: Subsequent items put in user data segment (binary rep of data in source file)
d l l b l d b.globl sym: declares sym global and can be referenced from other filesasciiz str: Store the string str in memory.asciiz str: Store the string str in memory and null‐terminate it.word w1…wn: Store the n 32‐bit quantities in qsuccessive memory words
Producing Machine Language• What about jumps (j and jal)?
– Jumps require absolute addressJumps require absolute address.– So, forward or not, still can’t generate machine instruction without knowing the position of instructions in memory.
• What about references to data?– la gets broken up into lui and ori– These will require the full 32‐bit address of the d tdata.
• These can’t be determined yet, so we create two tablestwo tables…
Symbol Table
• List of “items” in this file that may be used byList of items in this file that may be used by other files.
• What are they?• What are they?– Labels: function callingD t thi i th d t ti i bl– Data: anything in the .data section; variables which may be accessed across files
Relocation Table• List of “items” this file needs the address later.later.
• What are they?– Any label jumped to: j or jalAny label jumped to: j or jal
• internal• external (including lib files)
– Any piece of data• such as the la instruction
Object File Format• object file header: size and position of the other pieces of the object filep j
• text segment: the machine code
• data segment: binary representation of the data indata segment: binary representation of the data in the source file
• relocation information: identifies lines of code that need to be “handled”
• symbol table: list of this file’s labels and data that ycan be referenced
• debugging information
Linker (1/3)• Input: Object Code files, information tables (e.g., foo.o,libc.o for MIPS), )
• Output: Executable Code(e.g., a.out for MIPS)
• Combines several object (.o) files into a single executable (“linking”)
• Enable Separate Compilation of files– Changes to one file do not require recompilation of whole program
• Windows NT source was > 40 M lines of code!
– Old name “Link Editor” from editing the “links” in jump– Old name Link Editor from editing the links in jump and link instructions
Linker (2/3)
.o file 1t t 1text 1data 1 a.out
Relocated text 1info 1
Linker
Relocated text 1Relocated text 2
.o file 2text 2
Relocated data 1Relocated data 2
data 2info 2
Relocated data 2
info 2
Linker (3/3)
• Step 1: Take text segment from each o fileStep 1: Take text segment from each .o file and put them together.
• Step 2: Take data segment from each o file• Step 2: Take data segment from each .o file, put them together, and concatenate this onto end of text segmentsend of text segments.
• Step 3: Resolve References– Go through Relocation Table; handle each entry
– That is, fill in all absolute addresses
AcknowledgmentsAcknowledgments
Some of the slides contain material developedSome of the slides contain material developed and copyrighted by M.J. Irwin (Penn state), B. Parhami (UCSB) D Garcia (UCB) andParhami (UCSB), D. Garcia (UCB) and instructor material for the textbook
Linked List Example
• Let’s look at an example of using structures, pointers, malloc(), and free() to po te s, a oc(), a d ee() toimplement a linked list of strings.
/* node structure for linked list */struct Node {
h * lchar *value;struct Node *next;
}; cursive
inition!
};
Rec
def
typedef simplifies the codestruct Node {
char *value;struct Node *next;
}; String value;};
/* "typedef" means define a new type */typedef struct Node NodeStruct;
… OR …typedef struct Node {
char *value;struct Node *next;
} NodeStruct;} ;
… THEN
typedef NodeStruct *List;
/* Note similarity! *//* To define 2 nodes */
t t N d {typedef NodeStruct *List;typedef char *String;
struct Node {char *value;struct Node *next;
} node1, node2;} , ;
Linked List Example/* Add a string to an existing list *//* Add a string to an existing list */List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));List node (List) malloc(sizeof(NodeStruct));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
{String s1 = "abc", s2 = "cde";List theList NULLList theList = NULL;theList = cons(s2, theList);theList = cons(s1, theList);
/* or just like (cons s1 (cons s2 nil)) *// or, just like (cons s1 (cons s2 nil)) /theList = cons(s1, cons(s2, NULL));
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:list:
… …
NULL?
"abc"
NULLs:
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:list:
… …?
"abc"
NULL?
?s:
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:list:
… …
NULL
"abc"
NULL
?
"????"
s:
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:list:
… …
NULL
"abc"
NULL
?
"abc"
s:
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:list:
… …
NULLs:
"abc"
NULL
"abc"
Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){
List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));
node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;
}
node:… …
NULLNULL
"abc"
s:
"abc"
Important points to remember
• Remember:– Structure declaration does not allocate memory
Variable declaration does allocate memory– Variable declaration does allocate memory
• So far we have talked about several different ways to allocate memory for data:1. Declaration of a local variable
int i; struct Node list; char *string; int ar[n];
2. “Dynamic” allocation at runtime by calling allocation function (alloc).
int myGlobal;
ptr = (struct Node *) malloc(sizeof(struct Node)*n);
• One more possibility exists…3 Data declared outside of any procedure int myGlobal;
main() { }
3. Data declared outside of any procedure (i.e., before main).
– Similar to #1 above, but has “global” scope.
Slab Allocator
• A different approach to memory management (used in GNU libc)( )
• Divide blocks in to “large” and “small” by picking an arbitrary threshold size. Blocks larger than this threshold are managed with a freelist (as before).F ll bl k ll bl k i i h• For small blocks, allocate blocks in sizes that are powers of 2
e g if program wants to allocate 20 bytes actually– e.g., if program wants to allocate 20 bytes, actually give it 32 bytes
Slab Allocator
• Bookkeeping for small blocks is relatively easy: just use a bitmap for each range ofeasy: just use a bitmap for each range of blocks of the same size
• Allocating is easy and fast: compute the size ocat g s easy a d ast co pute t e s eof the block to allocate and find a free bit in the corresponding bitmap.
• Freeing is also easy and fast: figure out which slab the address belongs to and clear the corresponding bit.
Slab Allocator
16 byte blocks:
32 byte blocks:
64 byte blocks:
16 byte block bitmap: 11011000
32 byte block bitmap: 0111
64 byte block bitmap: 0064 byte block bitmap: 00
Slab Allocator Tradeoffs
• Extremely fast for small blocks.
Sl f l bl k• Slower for large blocks– But presumably the program will take more ti t d thi ith l bl k thtime to do something with a large block so the overhead is not as critical.
• Minimal space overhead• Minimal space overhead
• No fragmentation (as we defined it before) for small blocks, but still have wasted space!
Internal vs. External Fragmentation
• With the slab allocator, difference between requested size and next power of 2 isrequested size and next power of 2 is wasted– e.g., if program wants to allocate 20 bytes and g p g ywe give it a 32 byte block, 12 bytes are unused.
• We also refer to this as fragmentation, but call it internal fragmentation since the wasted space is actually within an allocated bl kblock.
• External fragmentation: wasted space b t ll t d bl kbetween allocated blocks.
Buddy System
• Yet another memory management technique (used in Linux kernel)technique (used in Linux kernel)
• Like GNU’s “slab allocator”, but only allocate blocks in sizes that are powers of 2 (internal b oc s s es t at a e po e s o ( te afragmentation is possible)
• Keep separate free lists for each sizep p– e.g., separate free lists for 16 byte, 32 byte, 64 byte blocks, etc.
Buddy SystemIf f bl k f i i il bl fi d bl k f• If no free block of size n is available, find a block of size 2n and split it in to two blocks of size n
• When a block of size n is freed if its neighbor of• When a block of size n is freed, if its neighbor of size n is also free, combine the blocks in to a single block of size 2n– Buddy is block in other half larger block
buddies NOT buddiesbuddies NOT buddies
• Same speed advantages as slab allocator
Buddy memory allocation64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K
t = 0 1024K
t = 1 A‐64K 64K 128K 256K 512K
t = 2 A 64K 64K B 128K 256K 512Kt = 2 A‐64K 64K B‐128K 256K 512K
t = 3 A‐64K C‐64K B‐128K 256K 512K
t = 4 A‐64K C‐64K B‐128K D‐128K 128K 512K
t = 5 A‐64K 64K B‐128K D‐128K 128K 512Kt 5 A 64K 64K B 128K D 128K 128K 512K
t = 6 128K B‐128K D‐128K 128K 512Kt = 7 256K D‐128K 128K 512Kt = 8 1024K
1. Program A requests memory 34K..64K in size2. Program B requests memory 66K..128K in size3 Program C req ests memor 35K 64K in si e3. Program C requests memory 35K..64K in size4. Program D requests memory 67K..128K in size5. Program C releases its memory6 Program A releases its memory6. Program A releases its memory7. Program B releases its memory8. Program D releases its memory
Allocation Schemes
• So which memory management scheme (K&R slab buddy) is best?scheme (K&R, slab, buddy) is best?– There is no single best approach for every applicationapplication.
–Different applications have different allocation / deallocation patternsallocation / deallocation patterns.
–A scheme that works well for one application may work poorly for anotherapplication may work poorly for another application.