DitiDmitri St kSt rukov Software Interfacestrukov/ece154aFall2013/viewgraphs/...$t8 $t9 $gp $k0 $k1 A doubleword sits in consecutive registers or memory locations according to the

ECE 154AECE 154A Introduction to Computer

ArchitectureFall 2013Fall 2013

D it i St kDmitri Strukov

Software InterfaceSoftware Interface

AgendaAgenda

• Procedures and stackProcedures and stack

• Memory mapping

li k d li• Arrays vs. linked lists

• Memory management

• Program compilation, linking, loading and execution

Big IdeaBig Idea

• Architecture should be convenient forArchitecture should be convenient for programmers– HW support for programming language– HW support for programming language constructions

– Debugging security etcDebugging, security etc.

Why Subroutines (Procedures) I t t?Important?

• Better structure– Fewer bugs, i.e. faster and cheaper development

• More compact code– Fewer bugs– Very important when memory is limited, e.g. early daysy

– Even for today’s computers will typically lead in better performance …

• Fewer misses (memory hierarchy)Fewer misses (memory hierarchy)– … but could have also negative effects if overhead (i.e. control instructions) is significant

Implementing SubroutinesImplementing Subroutines

• Can implement with existing instructions

– What if procedure is written by j proc

p ysomebody else and already compiled (e.g. library)

cont: xxx

…..– Still doable to patch binaries

• Procedures are very frequent

proc: xxx

j conty qso let’s have special instructions to support it JAL and JR

j cont

Instructions for Accessing Procedures• MIPS procedure call instruction:

jal ProcedureAddress #jump and linkjal ProcedureAddress #jump and link

• Saves PC+4 in register $ra to have a link to the next instruction for the procedure return

• Machine format (J format):

0x03 26 bit address

• Then can do procedure return with ajr $ra #returnjr $ra #return

• Instruction format (R format):

0 0x08

Illustrating a Procedure Call

Prepare main

jal proc

proc S t

PC Prepare

to continue

to call

Save, etc.

jr $ra Restore

Relationship between the main program and a procedure.

More Issues with ProceduresMore Issues with Procedures

• Q1: How to pass to and return from a procedure the data?

• Would like to use as many as possible register inside procedure (callee) to better utilize temporal locality but some register may be utilized by callerutilized by caller

Solution: Spill registers (move RF content to main memory and then restore) What is the exact mechanism for that in particularthen restore). What is the exact mechanism for that, in particular

• Q2: Which registers to spill? • Q3: Who is responsible saving (callee vs. caller) ? • Q4: Where to spill?• Q4: Where to spill?

Solution: There are certain rules enforced in a software which helps such implementationhelps such implementation

$0 0 $zero A 4-byte word

Typical Use of Registers

Saved Procedure arguments

Procedure results

Reserved for assembler use $1 $2 $3 $4 $5 $6

$at $v0

$a0

$a2

$v1

$a1

32 1 0

A 4 byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the

Answer to Q1

Temporary l

arguments $6 $7 $8 $9 $10 $11

$t0

$t2

$t1

$t3

$a2 $a3

When loading a byte into a

Byte numbering: 0 1 2 3

ylowest address)

In principle, one can use registers as values $12

$13 $14 $15 $16 $17

$t4

$t6

$t5

$t7

$s0

$s1

yregister, it goes in the low end Byte

Word

Doublew ord

p p , ghe/she likes without sticking to these guidelines

(one exception: In MIPS kernel registers might be

Operands

Saved across

procedure calls

$17 $18 $19 $20 $21 $22

$s2

$s4

$s6

$s1

$s3

$s5

rewritten by hardware on special occasions (exceptions) so it is better not to use them )

However, if the program is supposed b h i h h (

More temporaries

Global pointer

Reserved for OS (kernel)

$23 $24 $25 $26 $27 $28

$s7

$t8

$t9

$gp

$k0

$k1

A doubleword sits in consecutive registers or memory locations according to the big-endian order

to be run together with others (e.g. under certain OS and/or if it uses subroutines written by other people) then it is a good idea to stick to these Global pointer

Stack pointer Frame pointer Return address

Saved

$28 $29 $30 $31

$gp

$sp

$fp

$ra

(most significantword comes first)

grules

A Simple MIPS Procedure

Procedure to find the absolute value of an integer.

$v0 |($a0)|

Solution

The absolute value of x is –x if x < 0 and x otherwise.

abs: sub $v0,$zero,$a0 # put -($a0) in $v0; # in case ($a0) < 0

bltz $a0,done # if ($a0)<0 then done add $v0,$a0,$zero # else put ($a0) in $v0

done: jr $ra # return to calling program

In practice, we seldom use such short procedures because of the p , poverhead that they entail. In this example, we have 3-4 instructions of overhead for 3 instructions of useful computation.

No register spilling here -- see next example




Procedure results


$at $v0

$a0

$a2

$v1

$a1

32 1 0


Answer to Q2

Temporary l

arguments $6 $7 $8 $9 $10 $11

$t0

$t2

$t1

$t3

$a2 $a3



ylowest address)


$13 $14 $15 $16 $17

$t4

$t6

$t5

$t7

$s0

$s1


Word

Doublew ord



Operands

Saved across

procedure calls

$17 $18 $19 $20 $21 $22

$s2

$s4

$s6

$s1

$s3

$s5



More temporaries

Global pointer


$23 $24 $25 $26 $27 $28

$s7

$t8

$t9

$gp

$k0

$k1




Saved

$28 $29 $30 $31

$gp

$sp

$fp

$ra


grules

Six Steps in Execution of a Procedure(Answer to Q3)(Answer to Q3)

1. Main routine (caller) places parameters in a place where the procedure (callee) can access themprocedure (callee) can access them– $a0 ‐ $a3: four argument registers

2. Caller transfers control to the callee2. Caller transfers control to the callee

3. Callee acquires the storage resources needed

4. Callee performs the desired task

5. Callee places the result value in a place where the caller can access it

$ 0 $ 1 t l i t f lt l– $v0 ‐ $v1: two value registers for result values

6. Callee returns control to the caller– $ra: one return address register to return to the point of$ra: one return address register to return to the point of

origin

Illustrating a Procedure Call

Prepare main

jal proc

proc S t

PC Prepare

to continue

to call

Save, etc.

jr $ra Restore

Relationship between the main program and a procedure.

Spilling Registers (Answer to Q4)• What if the callee needs to use more registers than allocated to argument and return values?

callee uses a stack a last in first out queue– callee uses a stack – a last‐in‐first‐out queue

high addr One of the general registers, $sp ($29),

is used to address the stack (which

$sp

is used to address the stack (which “grows” from high address to low address)

dd d t t th t k htop of stack

add data onto the stack – push

$sp = $sp – 4data on stack at new $sp

remove data from the stack – pop

data from stack at $sp $ $ 4low addr $sp = $sp + 4

Allocating Space on the Stack• The segment of the stack containing a procedure’s saved registers and local

high addr

saved registers and local variables is its procedure frame (aka activation

d)

Saved argument regs (if any)

Saved return addr

$fp

record)– The frame pointer ($fp) points to the first word of

Saved return addr

Saved local regs (if any)

Local arrays & pthe frame of a procedure –providing a stable “base” register for the procedure

$sp

ystructures (if any)

• $fp is initialized using $spon a call and $sp is restored using $fp on a return

low addr

Example: Parameters and Results

$sp zlow addr

$sp

Frame for current procedure Saved

y z

. . . Local variables

b$sp c

Frame for bc

Frame for$fp

Old ($fp)

Savedregisters

b a

Frame forcurrent procedure

$fp

. . .

ba

Frame forprevious procedure . . .

Use of the stack by a procedure. Before calling After calling

high addr

y p

More on ProceduresMore on Procedures

Prolog

‐ spill all register to stack used by procedure expect for $t0‐$t9

and the one used for returning values

d k i ($ ) fi h i k‐ advance stack pointer ($sp) first then write to stack

Body

code of the procedurecode of the procedure

Epilog

‐ restore all used registers

‐ adjust stack pointer at the end ($sp)

Example of Using the Stack

Saving $fp, $ra, and $s0 onto the stack and restoring them at the end of the procedure

proc: sw $fp,-4($sp) # save the old frame pointeraddi $fp $sp 0 # save ($sp) into $fpaddi $fp,$sp,0 # save ($sp) into $fpaddi $sp,$sp,–12 # create 3 spaces on top of stacksw $ra,-8($fp) # save ($ra) in 2nd stack elementsw $s0,-12($fp) # save ($s0) in top stack element

$ ...lw $s0,-12($fp) # put top stack element in $s0lw $ra -8($fp) # put 2nd stack element in $ra

$sp($fp)

$fp

$sp($ra)($s0)

lw $ra, 8($fp) # put 2nd stack element in $raaddi $sp,$fp, 0 # restore $sp to original statelw $fp,-4($sp) # restore $fp to original statejr $ra # return from procedure

$fp

p

Could be a good idea to modify the stack pointer first in epilog (before writing to stack) and last in prolog. Why?




Procedure results


$at $v0

$a0

$a2

$v1

$a1

32 1 0


Temporary l

arguments $6 $7 $8 $9 $10 $11

$t0

$t2

$t1

$t3

$a2 $a3



ylowest address)


$13 $14 $15 $16 $17

$t4

$t6

$t5

$t7

$s0

$s1


Word

Doublew ord



Operands

Saved across

procedure calls

$17 $18 $19 $20 $21 $22

$s2

$s4

$s6

$s1

$s3

$s5



More temporaries

Global pointer


$23 $24 $25 $26 $27 $28

$s7

$t8

$t9

$gp

$k0

$k1




Saved

$28 $29 $30 $31

$gp

$sp

$fp

$ra


grules

Nested Procedure Calls

Prepare

main

jal abc

abc Save

PC Prepare to continue

Prepareto call

Procedure abc

Procedure xyz

jal xyz

xyz xyz

jr $ra Restore

jr $ra

Example of nested procedure calls.

Fibonacci numbers ( l bl )(Similar problem in HW4)

( ) ( 1) ( 2)F(n) = F(n‐1)+F(n‐2)

F(1) = 1

F(2) = 1F(2) 1

n = 1 2 3 4 5 6 …

F(n) = 1 1 2 3 5 8 …

/* Recursive function in c */

int fib(int n) {int fib(int n) {

If (n==1 || n==2) return 1;

return fib(n‐1)+fib(n‐2);

}

Memory mapping

Big Picture• More complicated picture for modern processors. Many details are

missing• Complication #1: IM and DM are caches: Fast but small memory• Complication #2: Program are mapped to virtual address space:

the mapping for the program and data in question should be aware of other programs and data (i.e. O/S) each program (process) is mapped to its own virtual address spacemapped to its own virtual address space

• Additional mechanism (implemented in SW and HW) are taking care of that (will be discussed later)

Main memory

HW + SW

Virtual memory

HW +

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

Read2

ALU

DataMemory

Address

Write Data

Read Data

SW

Write DataData 2 Write Data

SignExtend16 32

Big Picture

• Assume that there is only one program mapped to physical memoryto physical memory

• Questions to answer Wh t t d d h t t d t ?– Where to store code and where to store date?

– Would stack structure be enough to keep all the data? • What kind of data are typically present?• What kind of data are typically present?

• A related question: How to pass more than oneA related question: How to pass more than one parameter to procedure?

Address space (language and OS specific)

• A program’s address space contains 4 regions:– stack: local variables, grows

stack~ FFFF FFFFhex

stack: local variables, grows downward

– dynamic data (heap): space requested for pointers viarequested for pointers via malloc() ; resizes dynamically, grows upward

– static data: variables declareddynamic data

static data: variables declared outside main, does not grow or shrink

– code: loaded when program code

static datacode: loaded when program starts, does not change

For now, OS somehow

code~ 0hex

• Why stack grows from top to bottom? prevents accesses between stack and heap (gray hash lines). Wait for virtual memory

Memory Map inReserved 1 M words

Hex address

00000000

00400000 Memory Map in MIPS Program

10000000

Text segment 63 M words

Static dataAddressable10008000

1000ffff

Data segment

Static data

Dynamic data

Addressable with 16-bit signed offset

$gp

$sp

448 M words $28

$29

$30

Stack

7ffffffc

Stack segment

$fp

$30

80000000

Overview of the memory address space in MiniMIPS

Second half of address space reserved for memory-mapped I/O

Overview of the memory address space in MiniMIPS.

Linked Lists vs. Arrays

Pointers (1/4)

• Sometimes you want to have a procedure increment a variable?

$sp

p• What gets printed?

y

$fpframe pointer for main

void main() {int y = 5;

o a

y = 5jal AddOne…

lw $a0, ‐12($fp)…AddOne( y);printf(“y = %d\n”, y);

AddOne: addi $t0, $a0, 1jr $ra

}

void AddOne(int x)

$a0

( ){ x = x + 1; }

Pointers (2/4)

• Solved by passing in a pointerto our subroutine.

$sp

• Now what gets printed?y

$fp

addi $a0, $fp, ‐12

void main() {int y = 5;…

jal AddOne…AddOne: lw $t0, 0($a0)

addi $t0, $t0,1

…AddOne(&y);printf(“y = %d\n”, y);}

y = 6addi $t0, $t0,1sw $t0, 0($a0)jr $ra

}

void AddOne(int *p)

$a0

{ *p = *p + 1; }

Pointers (2.5/4) another way of correcting itcorrecting it

• Sometimes you want to have a procedure increment a variable?

$sp

procedure increment a variable?• What gets printed? y

void main() {int y = 5;

$fp

int y = 5;…y = AddOne( y);

jal AddOnesw $v0,‐12($fp)

lw $a0, ‐12($fp)

printf(“y = %d\n”, y);}int AddOne(int x)

y = 6$a0

…AddOne: addi $v0, $a0, 1

jr $ra( )

{ x = x + 1; return x;}

Pointers (3/4)B t h t if h t t

$sp

• But what if what you want changed is a pointer?

• What gets printed?A[2]A[1]A[0]q

• What gets printed? $fp

l $ 0 20($f )void main() {int A[3] = {50, 60, 70};int *q = A;

jal IncPtr…IncPtr: addi $t0, $a0, 1

lw $a0, ‐20($fp)

int q A;…IncrementPtr( q);i tf(“* %d\ ” * ) * 50 q

jr $ra

printf(“*q = %d\n”, *q);}

*q = 50 A q

$a0

void IncrementPtr(int *p){ p = p + 1; } 50 60 70

Pointers (4/4) $sp

A[0]q

• Solution! Pass a pointer to a pointer, declared as **h

A[2]A[1]

$fp

A[0]

p ,• Now what gets printed?

addi $a0, $fp, ‐20 jal IncPtr

$fp

void main() {int A[3] = {50, 60, 70};

jal IncPtr…IncPtr: lw $t0, 0($a0)

addi $t0, $t0,4$ ($ )int A[3] {50, 60, 70};

int *q = A;IncrementPtr(&q);i tf(“* %d\ ” * ) * 60

sw $t0, 0($a0)jr $ra

printf(“*q = %d\n”, *q);}

*q = 60 A q q$a0

Note +4!

void IncrementPtr(int **h){ *h = *h + 1; } 50 60 70

Arrays examplevoid foo() {

int *p, *q, x;int a[4];p = (int *) malloc (sizeof(int));q = &x;q = &x;

*p = 1; // p[0] would also work here printf("*p:%u, p:%u, &p:%u\n", *p, p, &p);*q = 2; // q[0] would also work here printf("*q:%u, q:%u, &q:%u\n", *q, q, &q);*a = 3; // a[0] would also work here

i f( * % % % \ * )printf("*a:%u, a:%u, &a:%u\n", *a, a, &a);

? ? ......0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 ...

? ? ?40 20 2 3 1

} p q x unnamed‐malloc‐space

*p:1, p:40, &p:12*q:2, q:20, &q:1624

?

q:2, q:20, &q:16*a:3, a:24, &a:24

“An array name is not a variable”

a

Example of array MIPS pseudocode.dataa: .word 100

.text

Example of array C code

int a[100];void main () {int b[10];int size;

.textaddi $sp, $sp, ‐10*4 – 8 ‐ #reg to spill * 4; addi $fp, $sp, 10*4 + 8 + #reg to spill * 4;add $t0, $fp, ‐10*4 (address of base of array b)****

main:

int *p;****p = (int *)malloc(sizeof(int)*size);****

add $a0, $0, $t1 ($t1 has value of size*4)jal malloc (malloc returning

memory address to $v0)****

free(p);****}

sw $v0, ‐44($fp) (modify *p)

add $a0, $v0, $0jal freej****addi $sp, $sp, +10*4 + 8 + #reg to spill * 4;jr $ra

malloc andmalloc and free are an OS procedures

C structures• A struct is a data structure composed from• A struct is a data structure composed from simpler data types.– Like a class in Java/C++ but without methods or i h iinheritance.

struct point { /* type definition */int x;i tint y;

};

void PrintPoint(struct point p)( p p){

printf(“(%d,%d)”, p.x, p.y);} As always in C, the argument is passed by “value” – a copy is made.

struct point p1 = {0,10}; /* x=0, y=10 */

PrintPoint(p1);PrintPoint(p1);

C structures: Pointers to them

• Usually, more efficient to pass a pointer to the structstruct.

• The C arrow operator (->) dereferences and extracts a structure field with a single operatorextracts a structure field with a single operator.

• The following are equivalent:

struct point *p;/* code to assign to pointer */g pprintf(“x is %d\n”, (*p).x);printf(“x is %d\n”, p->x);

How big are structs?

• Recall C operator sizeof() which gives size in bytes (of type or variable)size in bytes (of type or variable)

• How big is sizeof(p)? struct p {struct p {

char x;int y;y;

};– 5 bytes? 8 bytes? – Compiler may word align integer y

Array vs. Linked listy‐ Quickly changing size, order‐ More often dynamically

‐ Slowly changing size, order ‐ Could be allocated y y

(rarely statically)‐ Could be contiguous (when static) but most

Could be allocated dynamically or statically‐ Contiguous location in

static) but most often not

‐ Slower traversal /

memory ‐ Fast traversal / no memory overhead but fixed structure

additional memory for storing pointers but flexible structure

Example of linked list

Linked list example

d i

static

Struct mylist {int value;

l *

Example of linked list C code

dynamic

struct mylist *next;struct mylist *prev;

}

i i l d hiExample of linked list

In principle can do this(can be allocated in any type of memory):

struct mylist *list[100]; ****addi $a0, $0, 12

pMIPS pseudocode

main:

Most typically:

void main(){struct mylist *p *cur;

jal malloc (malloc returning memory address to $v0)

add $a0, $0, $t1 ($t1 has address cur)add $a1, $0, $v0

struct mylist p, cur;*****p = malloc(sizeof(struct mylist)*1);add(cur, p);*****

jal addelement****jal deleteadd $a0, $0, $t1

delete(cur);*****

}

jal free****jr $ra

stack

Deleting from doubly linked list example – I

Deleting from doubly linked list example ‐ II

Memory Management

Memory Management

• How do we manage memory?g y• Code, Static storage are easy: they never grow or shrinky g

• Stack space is also easy: stack frames are created and destroyed in ylast‐in, first‐out (LIFO) order

• Managing the heap is tricky:memory can be allocated / deallocated at any time

Heap Management Requirements

• Want malloc() and free() to run quickly.quickly.

• Want minimal memory overhead• Want to avoid fragmentation* –• Want to avoid fragmentationwhen most of our free memory is in many small chunks– In this case, we might have many free bytes but not be able to satisfy a large request since the free bytes are not contiguous in memory.

* This is technically called external fragmention

Heap Management

• An exampleRequest R1 for 100– Request R1 for 100 bytes

– Request R2 for 1 byte

R1 (100 bytes)

– Request R2 for 1 byte

– Memory from R1 is freed

R2 (1 byte)

freed

– Request R3 for 50 bytes

Heap Management

• An exampleRequest R1 for 100

R3?– Request R1 for 100 bytes

– Request R2 for 1 byte– Request R2 for 1 byte

– Memory from R1 is freed

R2 (1 byte)

freed

– Request R3 for 50 bytesR3?

Example (K&R) Malloc/Free ImplementationImplementation

• Each block of memory is preceded by a header that has two fields: size of the block and a pointer to the next block

• All free blocks are kept in a circular linked plist, the pointer field is unused in an allocated block

Example Implementation

• malloc() searches the free list for a block that is big enough If none is found more memory isis big enough. If none is found, more memory is requested from the operating system. If what it gets can’t satisfy the request, it fails.g y q ,

• free() checks if the blocks adjacent to the freed block are also freefreed block are also free– If so, adjacent free blocks are merged (coalesced) into a single, larger free block

– Otherwise, the freed block is just added to the free list

Choosing a block in malloc()

• If there are multiple free blocks of memory that are big enough for some request, howthat are big enough for some request, how do we choose which one to use?– best‐fit: choose the smallest block that is big genough for the request

– first‐fit: choose the first block we see that is big henough

– next‐fit: like first‐fit but remember where we finished searching and resume searching fromfinished searching and resume searching from there

Tradeoffs of allocation policies

• Best‐fit: Tries to limit fragmentation but at the cost of time (must examine all free blocks forcost of time (must examine all free blocks for each malloc). Leaves lots of small blocks (why?)

• First‐fit: Quicker than best‐fit (why?) butFirst fit: Quicker than best fit (why?) but potentially more fragmentation. Tends to concentrate small blocks at the beginning ofconcentrate small blocks at the beginning of the free list (why?)

• Next‐fit: Does not concentrate small blocks atNext fit: Does not concentrate small blocks at front like first‐fit, should be faster as a result.

Compiling, Linking, and Loading ProgramsPrograms

The C Code Translation HierarchyC program

compiler

assembly code

assembler

object code library routines

linker

executable

loader

machine code

memory

Compiler Benefits• Comparing performance for bubble (exchange) sort

– To sort 100,000 words with the array initialized to random values on a Pentium 4 with a 3 06 clock rate a 533 MHz systemvalues on a Pentium 4 with a 3.06 clock rate, a 533 MHz system bus, with 2 GB of DDR SDRAM, using Linux version 2.4.20

gcc opt Relative Clock cycles Instr count CPIgcc opt Relative performance

Clock cycles (M)

Instr count(M)

CPI

None 1.00 158,615 114,938 1.38

O1 (medium) 2 37 66 990 37 470 1 79O1 (medium) 2.37 66,990 37,470 1.79

O2 (full) 2.38 66,521 39,993 1.66

O3 (proc mig) 2.41 65,747 44,993 1.46

The unoptimized code has the best CPI, the O1 version has the lowest instruction count, but the O3 version is the fastest. Wh ?Why?

Assembler

• Input: Assembly Language Code(e g foo s for MIPS)(e.g., foo.s for MIPS)

• Output: Object Code, information tables(e g foo o for MIPS)(e.g., foo.o for MIPS)

• Reads and Uses Directives

• Replace Pseudoinstructions

• Produce Machine Languageg g

• Creates Object File

Assembler Directives• Give directions to assembler, but do not produce machine instructionsproduce machine instructions.text: Subsequent items put in user text segment (machine code).data: Subsequent items put in user data segment (binary rep of data in source file)

d l l b l d b.globl sym: declares sym global and can be referenced from other filesasciiz str: Store the string str in memory.asciiz str: Store the string str in memory and null‐terminate it.word w1…wn: Store the n 32‐bit quantities in qsuccessive memory words

Producing Machine Language• What about jumps (j and jal)?

– Jumps require absolute addressJumps require absolute address.– So, forward or not, still can’t generate machine instruction without knowing the position of instructions in memory.

• What about references to data?– la gets broken up into lui and ori– These will require the full 32‐bit address of the d tdata.

• These can’t be determined yet, so we create two tablestwo tables…

Symbol Table

• List of “items” in this file that may be used byList of items in this file that may be used by other files.

• What are they?• What are they?– Labels: function callingD t thi i th d t ti i bl– Data: anything in the .data section; variables which may be accessed across files

Relocation Table• List of “items” this file needs the address later.later.

• What are they?– Any label jumped to: j or jalAny label jumped to: j or jal

• internal• external (including lib files)

– Any piece of data• such as the la instruction

Object File Format• object file header: size and position of the other pieces of the object filep j

• text segment: the machine code

• data segment: binary representation of the data indata segment: binary representation of the data in the source file

• relocation information: identifies lines of code that need to be “handled”

• symbol table: list of this file’s labels and data that ycan be referenced

• debugging information

Linker (1/3)• Input: Object Code files, information tables (e.g., foo.o,libc.o for MIPS), )

• Output: Executable Code(e.g., a.out for MIPS)

• Combines several object (.o) files into a single executable (“linking”)

• Enable Separate Compilation of files– Changes to one file do not require recompilation of whole program

• Windows NT source was > 40 M lines of code!

– Old name “Link Editor” from editing the “links” in jump– Old name Link Editor from editing the links in jump and link instructions

Linker (2/3)

.o file 1t t 1text 1data 1 a.out

Relocated text 1info 1

Linker

Relocated text 1Relocated text 2

.o file 2text 2

Relocated data 1Relocated data 2

data 2info 2

Relocated data 2

info 2

Linker (3/3)

• Step 1: Take text segment from each o fileStep 1: Take text segment from each .o file and put them together.

• Step 2: Take data segment from each o file• Step 2: Take data segment from each .o file, put them together, and concatenate this onto end of text segmentsend of text segments.

• Step 3: Resolve References– Go through Relocation Table; handle each entry

– That is, fill in all absolute addresses

AcknowledgmentsAcknowledgments

Some of the slides contain material developedSome of the slides contain material developed and copyrighted by M.J. Irwin (Penn state), B. Parhami (UCSB) D Garcia (UCB) andParhami (UCSB), D. Garcia (UCB) and instructor material for the textbook

E M i lExtra Material

More on Linked Lists (D. Garcia UCB)

Linked List Example

• Let’s look at an example of using structures, pointers, malloc(), and free() to po te s, a oc(), a d ee() toimplement a linked list of strings.

/* node structure for linked list */struct Node {

h * lchar *value;struct Node *next;

}; cursive

inition!

};

Rec

def

typedef simplifies the codestruct Node {

char *value;struct Node *next;

}; String value;};

/* "typedef" means define a new type */typedef struct Node NodeStruct;

… OR …typedef struct Node {

char *value;struct Node *next;

} NodeStruct;} ;

… THEN

typedef NodeStruct *List;

/* Note similarity! *//* To define 2 nodes */

t t N d {typedef NodeStruct *List;typedef char *String;

struct Node {char *value;struct Node *next;

} node1, node2;} , ;

Linked List Example/* Add a string to an existing list *//* Add a string to an existing list */List cons(String s, List list){

List node = (List) malloc(sizeof(NodeStruct));List node (List) malloc(sizeof(NodeStruct));

node->value = (String) malloc (strlen(s) + 1);strcpy(node->value, s);node->next = list;return node;

}

{String s1 = "abc", s2 = "cde";List theList NULLList theList = NULL;theList = cons(s2, theList);theList = cons(s1, theList);

/* or just like (cons s1 (cons s2 nil)) *// or, just like (cons s1 (cons s2 nil)) /theList = cons(s1, cons(s2, NULL));

Linked List Example/* Add a string to an existing list 2nd call *// Add a string to an existing list, 2nd call /List cons(String s, List list){

List node = (List) malloc(sizeof(NodeStruct));( ) ( ( ));


}

node:list:

… …

NULL?

"abc"

NULLs:




}

node:list:

… …?

"abc"

NULL?

?s:




}

node:list:

… …

NULL

"abc"

NULL

?

"????"

s:




}

node:list:

… …

NULL

"abc"

NULL

?

"abc"

s:




}

node:list:

… …

NULLs:

"abc"

NULL

"abc"




}

node:… …

NULLNULL

"abc"

s:

"abc"

Important points to remember

• Remember:– Structure declaration does not allocate memory

Variable declaration does allocate memory– Variable declaration does allocate memory

• So far we have talked about several different ways to allocate memory for data:1. Declaration of a local variable

int i; struct Node list; char *string; int ar[n];

2. “Dynamic” allocation at runtime by calling allocation function (alloc).

int myGlobal;

ptr = (struct Node *) malloc(sizeof(struct Node)*n);

• One more possibility exists…3 Data declared outside of any procedure int myGlobal;

main() { }

3. Data declared outside of any procedure (i.e., before main).

– Similar to #1 above, but has “global” scope.

More on Heap Management Schemes

Slab Allocator

• A different approach to memory management (used in GNU libc)( )

• Divide blocks in to “large” and “small” by picking an arbitrary threshold size. Blocks larger than this threshold are managed with a freelist (as before).F ll bl k ll bl k i i h• For small blocks, allocate blocks in sizes that are powers of 2

e g if program wants to allocate 20 bytes actually– e.g., if program wants to allocate 20 bytes, actually give it 32 bytes

Slab Allocator

• Bookkeeping for small blocks is relatively easy: just use a bitmap for each range ofeasy: just use a bitmap for each range of blocks of the same size

• Allocating is easy and fast: compute the size ocat g s easy a d ast co pute t e s eof the block to allocate and find a free bit in the corresponding bitmap.

• Freeing is also easy and fast: figure out which slab the address belongs to and clear the corresponding bit.

Slab Allocator

16 byte blocks:

32 byte blocks:

64 byte blocks:

16 byte block bitmap: 11011000

32 byte block bitmap: 0111

64 byte block bitmap: 0064 byte block bitmap: 00

Slab Allocator Tradeoffs

• Extremely fast for small blocks.

Sl f l bl k• Slower for large blocks– But presumably the program will take more ti t d thi ith l bl k thtime to do something with a large block so the overhead is not as critical.

• Minimal space overhead• Minimal space overhead

• No fragmentation (as we defined it before) for small blocks, but still have wasted space!

Internal vs. External Fragmentation

• With the slab allocator, difference between requested size and next power of 2 isrequested size and next power of 2 is wasted– e.g., if program wants to allocate 20 bytes and g p g ywe give it a 32 byte block, 12 bytes are unused.

• We also refer to this as fragmentation, but call it internal fragmentation since the wasted space is actually within an allocated bl kblock.

• External fragmentation: wasted space b t ll t d bl kbetween allocated blocks.

Buddy System

• Yet another memory management technique (used in Linux kernel)technique (used in Linux kernel)

• Like GNU’s “slab allocator”, but only allocate blocks in sizes that are powers of 2 (internal b oc s s es t at a e po e s o ( te afragmentation is possible)

• Keep separate free lists for each sizep p– e.g., separate free lists for 16 byte, 32 byte, 64 byte blocks, etc.

Buddy SystemIf f bl k f i i il bl fi d bl k f• If no free block of size n is available, find a block of size 2n and split it in to two blocks of size n

• When a block of size n is freed if its neighbor of• When a block of size n is freed, if its neighbor of size n is also free, combine the blocks in to a single block of size 2n– Buddy is block in other half larger block

buddies NOT buddiesbuddies NOT buddies

• Same speed advantages as slab allocator

Buddy memory allocation64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K 64K

t = 0 1024K

t = 1 A‐64K 64K 128K 256K 512K

t = 2 A 64K 64K B 128K 256K 512Kt = 2 A‐64K 64K B‐128K 256K 512K

t = 3 A‐64K C‐64K B‐128K 256K 512K

t = 4 A‐64K C‐64K B‐128K D‐128K 128K 512K

t = 5 A‐64K 64K B‐128K D‐128K 128K 512Kt 5 A 64K 64K B 128K D 128K 128K 512K

t = 6 128K B‐128K D‐128K 128K 512Kt = 7 256K D‐128K 128K 512Kt = 8 1024K

1. Program A requests memory 34K..64K in size2. Program B requests memory 66K..128K in size3 Program C req ests memor 35K 64K in si e3. Program C requests memory 35K..64K in size4. Program D requests memory 67K..128K in size5. Program C releases its memory6 Program A releases its memory6. Program A releases its memory7. Program B releases its memory8. Program D releases its memory

Allocation Schemes

• So which memory management scheme (K&R slab buddy) is best?scheme (K&R, slab, buddy) is best?– There is no single best approach for every applicationapplication.

–Different applications have different allocation / deallocation patternsallocation / deallocation patterns.

–A scheme that works well for one application may work poorly for anotherapplication may work poorly for another application.

Documents

DitiDmitri St kSt rukov Software Interfacestrukov/ece154aFall2013/viewgraphs/...$t8 $t9 $gp $k0 $k1 A doubleword sits in consecutive registers or memory locations according to the