Start of Lecture: March 12, 2014 - University of Albertasmartynk/Resources/CMPUT 379...unusable! (page 363)!27 Chapter 8: Main Memory Issues with reclaiming free’d memory • Memory

Chapter 8: Main Memory

Start of Lecture: March 12, 2014

��1

poop


Reminders

• Exercise 4 is due on March 18

• Paul Lu will guest lecture this Friday on Virtual Memory

• Changed readings: read to Chapter 10, then we’ll jump to Chapter 15 on Security

• security is a really important topic today, and you already know lots about filesystems (which was Chapter 11 and Chapter 12)

��2


Some Thought Questions• Why isn't turnaround time a more popular means to

base scheduling than response time?

• turnaround time = total time between submission of a task (i.e. tasks enters system) until task completes

• response time = total time between submission of a task and the task is first run (i.e. until first response produced)

• e.g. Matlab starts up, you run an experiment and then move to other tasks; the experiment may take long running in the background (long turnaround), but initial human interaction should be fast (fast response)

• How is the priority of a process determined, and where is this property stored?

• on Linux, stored in task_struct (i.e. PCB) as variable int prio

• priority determined externally and heuristically, e.g. total runtime used, number open files, ratio of average I/O burst to average CPU/burst

��3


Theorem for optimal page size (according to one objective)

• Let c1 = cost of losing a word to table fragmentation and c2 = cost of losing a word to internal fragmentation

• Assume that each program begins on a page boundary

• If the average program size s0 is much larger than the page size z, then the optimal page size is approximately

��4

p2(c1/c2)s0

Exercise: How would you prove this?


Proof of optimal page size

��5

internal frag. cost = c2z/2

table frag. cost = c1s0/z

L(z) = E[cost|z] = c1s0/z + c2z/2

dL/dz = �c1s0/z2+ c2/2

0 = �c1s0/z2+ c2/2

c1s0 = c2z2/2

z =

p2(c1/c2)s0

c1 = cost of losing a word to table fragmentation c2 = cost of losing a word to internal fragmentation s0 = average program size >> page size z


Paging and Frames

��6

(a)

free-frame list1413182015

13

14

15

16

17

18

19

20

21

page 0 page 1 page 2 page 3

new process

(b)

free-frame list15 13 page 1

page 0

page 2

page 3

14

15

16

17

18

19

20

21

page 0 page 1 page 2 page 3

new process

new-process page table

140123

131820

Before allocation After allocation


What does physical memory look like?

• cat /proc/PID/smaps

• cat /proc/PID/status

• for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 2 -n -r

• cat /proc/meminfo

��7


What does Linux code look like for pages?• struct page in linux/mm_types.h for physical page

• virt_to_page() in asm-generic/page.h

��8


Implementation of Page Table

• Page table is kept in main memory

• every process has its own page table

• Page-table base register (PTBR) points to page table

• Page-table length register (PTLR) indicates size of the page table

• Every data/instruction access requires two memory accesses: one for page table, one for data/instruction

• Can we reduce it to one access? Yes, with fast-lookup hardware cache called translation look-aside buffers

��9


Hardware support: Translation Look-Aside Buffer (TLB)

• TLB is a cache usually made from content-addressable memory: user provides data-word instead of address and CAM searches for data-word in memory

• like an associative array with (key, value) pairs, where for a CAM key = (address space id, page number) and value = frame number

• address space id like process id (unique identifier), but a different number

• Contrast with RAM, where user provides address and receives data-word

• TLBs store unique identifier for each process so that entire TLB does not need to be flushed with context-switch

• TLB small (64 -1024 entires), but relies on notion of locality

��10


Lookups with TLB

��11

page table

f

CPU

logicaladdress

p d

f d

physicaladdress

physicalmemory

p

TLB miss

pagenumber

framenumber

TLB hit

TLB


How often is there a TLB miss?

• If TLB misses were common, having a TLB would not be much of an advantage

• now just extra overhead of also looking in TLB

• But since the TLB is small, why is it that TLB misses are not more common?

• Locality — structure of programs due to linear data structures; related data is stored in nearby locations

• spatial locality — sequential processing due to loops

• temporal locality — likely that same location will reference again in near future

��12


Effective memory-access time

��138.44 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition

Effective Access Time � Associative Lookup = H time unit

z Can be < 10% of memory access time � Hit ratio = D

z Hit ratio – percentage of times that a page number is found in the associative registers; ratio related to number of associative registers

� Consider D = 80%, H = 20ns for TLB search, 100ns for memory access � Effective Access Time (EAT) EAT = (1 + H) D + (2 + H)(1 – D) = 2 + H – D � Consider D = 80%, H = 20ns for TLB search, 100ns for memory access

z EAT = 0.80 x 100 + 0.20 x 200 = 120ns � Consider more realistic hit ratio -> D = 99%, H = 20ns for TLB search,

100ns for memory access z EAT = 0.99 x 100 + 0.01 x 200 = 101ns


Exercise: some thought questions so far

• Pretend you knew that your page table would be really small (say 3 entries). What might you do to have fast lookups for entries in your page table?

• What about read-write protection? Do we have to go through entire process of going to physical memory to check if can read-write or read before causing a trap?

• A linear page table of the entire 32-bit virtual address space with 4K pages results in a 4MB page table! How can we avoid having such a large page table in memory?

��14


Memory Protection

��15

page 0

page 0

page 1

page 2

page 3

page 4

page 5

page n

•••

00000

0

1

2

3

4

5

6

7

8

9

frame number

01234567

23478900

vvvvvvii

page table

valid–invalid bit

10,468

12,287

page 1

page 2

page 3

page 4

page 5


Hierarchical Paging: Multiple Levels

��16

logical address

outer pagetable

p1 p2

p1

page ofpage table

p2

d

d

Programs generally only use small chunks of their memory; only a small number of inner tables need to be saved (i.e. small number of entries in outer table)


Is hierarchical paging really useful?

• Exercise: how much does adding another layer reduce the size of the innermost page table?

��17

10 10 12

inner page table = 2^10 bits * 4 bytes = 512 bytes

2 5 123

inner page table = 2^5 bits * 4 bytes = 16 bytes


• If pages uniformly sprinkled across virtual address space, then more inner tables are created and nothing is gained

• As more outer tables are added, more memory accesses are needed to get to one physical memory address

• For 64-bit address spaces, this approach is no longer acceptable because too many layers are needed

��18

Is hierarchical paging really useful?


Hashed Page Tables

��19

hash table

q s

logical addressphysicaladdress

physicalmemory

p d r d

p rhashfunction • • •


Video Break: brought to you by another stupendous classmate!

��20

https://www.youtube.com/watch?v=7g0pi4J8auQ&feature=youtu.be

https://www.youtube.com/watch?v=7g0pi4J8auQ&feature=youtu.be


Inverted Page Table

��21

page table

CPU

logicaladdress physical

address physicalmemory

i

pid p

pid

search

p

d i d

• Page table much smaller • Hash table on pid so do not to scan entire table • Issue: shared memory


What does a Linux page table look like?

��22

page table entry


Summary

• Need to load processes into memory to enable efficient computation, but there are many approaches

• contiguous memory allocation versus non-contiguous

• static versus dynamic addressing

• Segmentation and paging allocate memory to processes in a non-contiguous way

• segmentation has no internal fragmentation, but does have external

• paging has no external fragmentation, but does have internal

• Paging enables sharing of common code (e.g. libraries)

��23


Summary

• Paging with virtual addressing solves many of the historical problems with managing memory, but is not necessarily the “optimal” approach

• Hardware typically provides support for fast paging in the form of Translation Look-Aside Buffers due to the fact that paging has become a ubiquitous approach

• Other techniques: hierarchical page tables, hashed page tables, clustered page tables and inverted page tables

��24


Dynamic memory allocation: malloc and free

• Once a page has been allocated, that memory could be free’d, causing further fragmentation

• Dynamic allocation can be handled using either the stack (hierarchical, restrictive) or the heap (more general, less efficient) allocation

• Fragmentation looks different for the stack and heap

��25


Stack allocation

��26 Copyright © 1996–2002 Eskicioglu and Marsland (and Prentice-Hall and Paul Lu)

Memory Mgmt 50July

99

Stack organizationMemory allocation and freeing operations are partially predictable. Since the organization is hierarchical, the freeing operates in reverse (opposite) order.

Current stack After call to A

After callto B

After returningfrom B

After returningfrom A

A’sstackframe

A’sstackframe

A’sstackframe

top

top

top

top

top

B’sstackframe


Heap allocation

• Allocation and release of heap space is totally random; heap space begins to fill with holes (i.e. fragmentation)

• Heap is a microcosm of the external fragmentation issues that arose for dynamic allocation with non-uniformly allocated block sizes

• Statistically, for first-fit, 1/3 of the memory becomes unusable! (page 363)

��27


Issues with reclaiming free’d memory

• Memory might be free’d within an allocated page on the heap, but other parts of the page are still in use

• How do we know when the page can be free’d?

• Two problems when reclaiming memory:

• Dangling pointers: occur when the original allocator frees a shared pointer

• Memory leaks: occur when we forget to free storage, even when it will not or cannot be used again. This is a serious and common problem! This problem is unacceptable in OS code.

��28


Reclaiming memory

• Memory can be reclaimed by keeping track of reference counters: outstanding pointers to each block of memory

• When counter goes to zero, the memory block can be free’d

• To reduce problems with dangling pointers and memory leaks, some systems do garbage collection — search through deleted pointers and reclaim storage they reference

• Can be very expensive (some instance where takes up to 20% CPU time)

• But does eliminate a large class of application programmer errors

��29

Documents

Start of Lecture: March 12, 2014 - University of Albertasmartynk/Resources/CMPUT 379...unusable! (page 363)!27 Chapter 8: Main Memory Issues with reclaiming free’d memory • Memory