184
05/01/2006 ecs150, Spring 2006 1 UCDavis, ecs150 Spring 2006 ecs150 Spring 2006: Operating System Operating System #4: Memory Management (chapter 5) Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ [email protected]

ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

  • Upload
    ginger

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5). Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ [email protected]. text. data. BSS. user stack. args/env. kernel. - PowerPoint PPT Presentation

Citation preview

Page 1: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 1

UCDavis, ecs150Spring 2006

ecs150 Spring 2006:Operating SystemOperating System#4: Memory Management(chapter 5)

Dr. S. Felix WuComputer Science DepartmentUniversity of California, Davishttp://www.cs.ucdavis.edu/~wu/[email protected]

Page 2: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 2

UCDavis, ecs150Winter 2006

Page 3: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 3

UCDavis, ecs150Winter 2006

textdataBSS

user stackargs/env

kernel

data

file volumewith

executable programs

Fetches for clean text or data are typically fill-from-file.

Modified (dirty) pages are pushed to backing store (swap) on eviction.

Paged-out pages are fetched from backing store when needed.

Initial references to user stack and BSS are satisfied by zero-fill on demand.

Page 4: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 4

UCDavis, ecs150Winter 2006

Logical vs. Physical AddressLogical vs. Physical Address The concept of a logical address space that is bound to a

separate physical address space is central to proper memory management.– Logical address – generated by the CPU; also referred to as

virtual address.– Physical address – address seen by the memory unit.

Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme.

Page 5: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 5

UCDavis, ecs150Winter 2006 Memory-Management Unit Memory-Management Unit

((MMUMMU)) Hardware device that maps

virtual to physical address. In MMU scheme, the value in the

relocation register is added to every address generated by a user process at the time it is sent to memory.

The user program deals with logical addresses; it never sees the real physical addresses.

MMU

CPU

Memory

Virtualaddress

Physicaladdress

Data

Page 6: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 6

UCDavis, ecs150Winter 2006

Paging: Paging: Page Page and and FrameFrame Logical address space of a process can be noncontiguous; process

is allocated physical memory whenever the latter is available. Divide physical memory into fixed-sized blocks called frames

(size is power of 2, between 512 bytes and 8192 bytes). Divide logical memory into blocks of same size called pages. Keep track of all free frames. To run a program of size n pages, need to find n free frames and

load program. Set up a page table to translate logical to physical addresses. Internal fragmentation.

Page 7: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 7

UCDavis, ecs150Winter 2006

frames

Page 8: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 8

UCDavis, ecs150Winter 2006 Address Translation Architecture Address Translation Architecture

Page 9: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 9

UCDavis, ecs150Winter 2006

Address Translation SchemeAddress Translation Scheme Address generated by CPU is divided into:

– Page number (p) – used as an index into a page table which contains base address of each page in physical memory.

– Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit.

Page 10: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 10

UCDavis, ecs150Winter 2006

Virtual MemoryVirtual Memory

MAPPINGin MMU

Page 11: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 11

UCDavis, ecs150Winter 2006

shared by all user processes

Page 12: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 12

UCDavis, ecs150Winter 2006

kernel

Page 13: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 13

UCDavis, ecs150Winter 2006

Page 14: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 14

UCDavis, ecs150Winter 2006

textdataidata

wdata

header

symboltable, etc.

programsections

textdataBSS

user stackargs/env

kernel

data

processsegments

physicalpage frames

virtualmemory

(big)

physicalmemory(small)

executablefile

backingstorage

virtual-to-physical translations

pageout/eviction

page fetch

MAPPINGin MMU

How to represent

Page 15: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 15

UCDavis, ecs150Winter 2006

PagingPaging

Advantages? Disadvantages?

Page 16: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 16

UCDavis, ecs150Winter 2006

FragmentationFragmentation External Fragmentation – total memory space exists to

satisfy a request, but it is not contiguous. Internal Fragmentation – allocated memory may be slightly

larger than requested memory; this size difference is memory internal to a partition, but not being used.

Reduce external fragmentation by compaction– Shuffle memory contents to place all free memory together in one large

block.– Compaction is possible only if relocation is dynamic, and is done at

execution time.– I/O problem

Latch job in memory while it is involved in I/O. Do I/O only into OS buffers.

Page 17: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 17

UCDavis, ecs150Winter 2006

Page size?Page Table Size?

Page 18: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 18

UCDavis, ecs150Winter 2006

32 bitsAddress bus232 bytes1 page = 4K bytes256M bytes main memory

1 page = 212

220 pages

222 bytes4 MB

Page 19: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 19

UCDavis, ecs150Winter 2006 Page Table EntryPage Table Entry

cachingdisabled

referenced modified

protection

present/absent

page frame number

Page 20: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 20

UCDavis, ecs150Winter 2006 Free FramesFree Frames

Before allocation After allocation

Page 21: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 21

UCDavis, ecs150Winter 2006 Page FaultsPage Faults

Page table access Load the missing page (replace one) Re-access the page table access.

How large is the page table?– 232 address space, 4K (212) size page.– How many entries? 220 entries (1 MB).– If 246, you need to access to both segment table and page

table…. (226 GB or 216 TB) Cache the page table!!

Page 22: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 22

UCDavis, ecs150Winter 2006 Page FaultsPage Faults

Hardware Trap– /usr/src/sys/i386/i386/trap.c

VM page fault handler vm_fault()– /usr/src/sys/vm/vm_fault.c

Page 23: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 23

UCDavis, ecs150Winter 2006

/usr/src/sys/vm/vm_map.h

How to implement?

Page 24: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 24

UCDavis, ecs150Winter 2006

Implementation of Page TableImplementation of Page Table Page table is kept in main memory. Page-table base register (PTBR) points to the page table. Page-table length register (PRLR) indicates size of the

page table. In this scheme every data/instruction access requires two

memory accesses. One for the page table and one for the data/instruction.

The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs)

Page 25: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 25

UCDavis, ecs150Winter 2006

Two IssuesTwo Issues

Virtual Address Access Overhead The size of the page table

Page 26: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 26

UCDavis, ecs150Winter 2006

TLB (Translation Lookaside Buffer)TLB (Translation Lookaside Buffer)

Associative Memory:– expensive, but fast -- parallel searching

TLB: select a small number of page table entries and store them in TLB

virt-page modified protectionpage frame140 1 RW 3120 0 RX 38

130 1 RW 29129 1 RW 62

Page 27: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 27

UCDavis, ecs150Winter 2006

Associative MemoryAssociative Memory

Associative memory – parallel search

Address translation (A´, A´´)– If A´ is in associative register, get frame # out. – Otherwise get frame # from page table in memory

Page # Frame #

Page 28: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 28

UCDavis, ecs150Winter 2006

Page 29: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 29

UCDavis, ecs150Winter 2006 Paging Hardware With TLBPaging Hardware With TLB

TLB MissVersusPage Fault

Page 30: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 30

UCDavis, ecs150Winter 2006

Hardware or SoftwareHardware or Software TLB is part of MMU (hardware):

– Automated page table entry (pte) update– OS handling TLB misses

Why software????– Reduce HW complexity– Flexibility in Paging/TLB content management

for different applications

Page 31: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 31

UCDavis, ecs150Winter 2006 Inverted Page TableInverted Page Table

264 address space with 4K pages– page table: 252 ~ 1 million gigabytes

Page 32: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 32

UCDavis, ecs150Winter 2006 Inverted Page Table (iPT)Inverted Page Table (iPT)

264 address space with 4K pages– page table: 252 ~ 1 million gigabytes

One entry per one page of real memory.– 128 MB with 4K pages ==> 214 entries

Disadvantage:– For every memory access, we need to search

for the whole paging hash list.

Page 33: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 33

UCDavis, ecs150Winter 2006 Page Table Page Table

Page 34: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 34

UCDavis, ecs150Winter 2006 Inverted Page Table Inverted Page Table

Page 35: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 35

UCDavis, ecs150Winter 2006

Page 36: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 36

UCDavis, ecs150Winter 2006 BrainstormingBrainstorming

How to design an “inverted page table” such that we can do it “faster”?

Page 37: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 37

UCDavis, ecs150Winter 2006 Hashed Page TablesHashed Page Tables

Common in address spaces > 32 bits.

The virtual page number is hashed into a page table. This page table contains a chain of elements hashing to the same location.

Virtual page numbers are compared in this chain searching for a match. If a match is found, the corresponding physical frame is extracted.

Page 38: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 38

UCDavis, ecs150Winter 2006

virtual page# Hash

virtual page# physical page#

Page 39: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 39

UCDavis, ecs150Winter 2006

Page 40: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 40

UCDavis, ecs150Winter 2006 iPT/Hash Performance iPT/Hash Performance

IssuesIssues still do TLB (hw/sw)

– if we can hit the TLB, we do NOT need to access the iPT and hash.

caching the iPT and/or Hash Table??– any benefits under regular on-demand caching

schemes? hardware support for iPT/Hash

Page 41: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 41

UCDavis, ecs150Winter 2006 TLB (Translation Lookaside TLB (Translation Lookaside

Buffer)Buffer) Associative Memory:

– expensive, but fast -- parallel searching TLB: select a small number of page table

entries and store them in TLBvirt-page modified protectionpage frame

140 1 RW 3120 0 RX 38

130 1 RW 29129 1 RW 62

Page 42: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 42

UCDavis, ecs150Winter 2006

Paging Paging Virtual MemoryVirtual Memory CPU address-ability: 32 bits -- 232 bytes!!

– 232 is 4 Giga bytes (un-segmented).– Pentium II can support up to 246 (64 Tera) bytes

32 bits – address, 14 bits – segment#, 2 bits – protection.

Very large addressable space (64 bits), and relatively smaller physical memory available…– Let the programs/processes enjoy a much larger virtual

space!!

Page 43: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 43

UCDavis, ecs150Winter 2006 VM with 1 SegmentVM with 1 Segment

MAPPINGin MMU

Page 44: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 44

UCDavis, ecs150Winter 2006 Eventually…Eventually…

MAPPINGin MMU

???

Page 45: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 45

UCDavis, ecs150Winter 2006

On-Demand PagingOn-Demand Paging

On-demand paging:– we have to kick someone out…. But which

one?– Triggered by page faults.

Loading in advance. (Predictive/Proactive)– try to avoid page fault at all.

Page 46: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 46

UCDavis, ecs150Winter 2006

Demand PagingDemand Paging

On a page fault the OS:– Save user registers and process state. – Determine that exception was page fault. – Find a free page frame. – Issue read from disk to free page frame. – Wait for seek and latency and transfers page

into memory. – Restore process state and resume execution.

Page 47: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 47

UCDavis, ecs150Winter 2006 Page ReplacementPage Replacement

1. Find the location of the desired page on disk.

2. Find a free frame:- If there is a free frame, use it.- If there is no free frame, use a page replacement

algorithm to select a victim frame.

3. Read the desired page into the (newly) free frame. Update the page and frame tables.

4. Restart the process.

Page 48: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 48

UCDavis, ecs150Winter 2006

Page Replacement Page Replacement AlgorithmsAlgorithms

minimize page-fault rate

Page 49: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 49

UCDavis, ecs150Winter 2006

Page ReplacementPage Replacement Optimal FIFO Least Recently Used (LRU) Not Recently Used (NRU) Second Chance Clock Paging

Page 50: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 50

UCDavis, ecs150Winter 2006

OptimalOptimal Estimate the next page reference time in the

future. Select the longest one.

Page 51: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 51

UCDavis, ecs150Winter 2006

LRULRU

an implementation issue– I need to keep tracking the last modification or

access time for each page– timestamp: 32 bits

How to implement LRU efficiently?

Page 52: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 52

UCDavis, ecs150Winter 2006 LRU ApproximationLRU Approximation

Reference bit (one-bit timestamp)– With each page associate a bit, initially = 0– When page is referenced bit set to 1.– Replace the one which is 0 (if one exists). We do not know

the order, however. Second chance

– Need reference bit.– Clock replacement.– If page to be replaced (in clock order) has reference bit = 1.

then: set reference bit 0. leave page in memory. replace next page (in clock order), subject to same rules.

Page 53: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 53

UCDavis, ecs150Winter 2006 NRUNRU

Not Recently Used Clear the bits every 20 milliseconds.

referenced modified

What is the problem??

Page 54: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 54

UCDavis, ecs150Winter 2006

Page Replacement??Page Replacement??

Efficient Approximation of LRU No periodic refreshing

How to do that?

Page 55: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 55

UCDavis, ecs150Winter 2006

Second Chance/Clock PagingSecond Chance/Clock Paging

Do not need any “periodic” bit clearing Have a “current candidate pointer” moving

along the “clock” Choose the first page with zero flag(s)

Page 56: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 56

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

A

B

C

D

E

F

A

B

C

D

E

F

G

Page 57: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 57

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

G

B

C

D

E

F

G

B

C

D

E

F

H

Page 58: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 58

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

G

B

C

D

E

F

G

B

C

D

E

F

H

Page 59: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 59

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

G

B

H

D

E

F

G

B

H

D

E

F

I

Page 60: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 60

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

G

B

H

D

E

F

G

B

H

D

E

F

I

Page 61: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 61

UCDavis, ecs150Winter 2006

Clock PagesClock Pages

G

I

H

D

E

F

G

I

H

D

E

F

Page 62: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 62

UCDavis, ecs150Winter 2006

EvaluationEvaluation the page-fault rate. Evaluate algorithm by running it on a

particular string of memory references (reference string) and computing the number of page faults on that string.

In all our examples, the reference string is 2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2.

Page 63: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 63

UCDavis, ecs150Winter 2006

3 physical pages

2

2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2

23

23

231

5 (2)31

52 (3)

1

FIFO

Page Faults

Page 64: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 64

UCDavis, ecs150Winter 2006

Page ReplacementPage Replacement

2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2 OPT/LRU/FIFO/CLOCK and 3 pages how many page faults?

Page 65: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 65

UCDavis, ecs150Winter 2006

Page 66: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 66

UCDavis, ecs150Winter 2006 ThrashingThrashing

If a process does not have “enough” pages, the page-fault rate is very high. This leads to:– low CPU utilization.– operating system thinks that it needs to increase the

degree of multiprogramming.– another process added to the system.

Thrashing a process is busy swapping pages in and out.

Page 67: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 67

UCDavis, ecs150Winter 2006 Thrashing Thrashing

Why does paging work?Locality model– Process migrates from one locality to another.– Localities may overlap.

Why does thrashing occur? size of locality > total memory size

Page 68: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 68

UCDavis, ecs150Winter 2006

How to Handle Thrashing?How to Handle Thrashing?

Brainstorming!!

Page 69: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 69

UCDavis, ecs150Winter 2006 Locality In A Memory-Reference PatternLocality In A Memory-Reference Pattern

Page 70: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 70

UCDavis, ecs150Winter 2006

FreeBSD VMFreeBSD VM

Page 71: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 71

UCDavis, ecs150Winter 2006

/usr/src/sys/vm/vm_map.h

How to implement?

Page 72: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 72

UCDavis, ecs150Winter 2006

Text

InitializedData(Copy on Write)

UnintializedData(Zero-Fill)AnonymousObject

Stack(Zero-Fill)AnonymousObject

Page 73: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 73

UCDavis, ecs150Winter 2006 Page-level Allocation

• Kernel maintains a list of free physical pages.• Two principal clients:

the paging systemthe kernel memory allocator

Page 74: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 74

UCDavis, ecs150Winter 2006 Memory allocationMemory allocation

Page-levelallocator

physical page

Kernel MemoryAllocator

Pagingsystem

Networkbuffers

Datastructures

tempstorage process Buffer cache

Page 75: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 75

UCDavis, ecs150Winter 2006

kernel textinitialized/un-initialized data

kernelmalloc

networkbuffer

kernelI/O

Page 76: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 76

UCDavis, ecs150Winter 2006 Why Kernel MA is special?Why Kernel MA is special?

Typical request is for less than 1 page Originally, kernel used statically allocated, fixed size

tables, but it is too limited Kernel requires a general purpose allocator for both

large and small chunks of memory. handles memory requests from kernel modules, not

user level applications– pathname translation routine, STREAMS or I/O buffers,

zombie structures, table table entries (proc structure etc)

Page 77: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 77

UCDavis, ecs150Winter 2006

KMA RequirementsKMA Requirements utilization factor = requested/required memory

– Useful metric that factors in fragmentation.– 50% considered good

KMA must be fast since extensively used Simple API similar to malloc and free.

• desirable to free portions of allocated space, this is different from typical user space malloc and free interface

Properly aligned allocations: for example 4 byte alignment Support burst-usage patterns Interaction with paging system – able to borrow pages from

paging system if running low

Page 78: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 78

UCDavis, ecs150Winter 2006

KMA SchemesKMA Schemes Resource Map Allocator Simple Power-of-Two Free Lists The McKusick-Karels Allocator

– Freebsd The Buddy System

– Linux SVR4 Lazy Buddy Allocator Mach-OSF/1 Zone Allocator Solaris Slab Allocator

– Freebsd, linux, Solaris,

Page 79: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 79

UCDavis, ecs150Winter 2006

Resource Map AllocatorResource Map Allocator Resource map is a set of <base,size> pairs that monitor areas of

free memory Initially, pool described by a single map entry =

<pool_starting_address, pool_size> Allocations result in pool fragmenting with one map entry for

each contiguous free region Entries sorted in order of increasing base address Requests satisfied using one of three policies:

– First fit – Allocates from first free region with sufficient space. UNIX, fasted, fragmentation is concern

– Best fit – Allocates from smallest that satisfies request. May leave several regions that are too small to be useful

– Worst fit - Allocates from largest region unless perfect fit is found. Goal is to leave behind larger regions after allocation

Page 80: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 80

UCDavis, ecs150Winter 2006

<0,1024>

<256,128>

<128,32>

after: rmalloc(256), rmalloc(320), rmfree(256,128)

offset_t rmalloc(size)void rmfree(base, size)

<576,448>

after: rmfree(128,128)<128,256> <576,448>

<288,64> <544,128> <832,32>

Page 81: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 81

UCDavis, ecs150Winter 2006

Resource Map -Good/BadResource Map -Good/Bad Advantages:

– simple, easy to implement– not restricted to memory allocation, any collection

of objects that are sequentially ordered and require allocation and freeing in contiguous chunks.

– Can allocate exact size within any alignment restrictions. Thus no internal fragmentation.

– Client may release portion of allocated memory.– adjacent free regions are coalesced

Page 82: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 82

UCDavis, ecs150Winter 2006 Resource Map -Good/Bad

• Disadvantages:Map may become highly fragmented resulting in

low utilization. Poor for performing large requests.

Resource map size increases with fragmentation static table will overflow dynamic table needs it’s own allocator

Map must be sort for free region coalescing. Sorting operations are expensive.

Requires linear search of map to find free region that matches allocation request.

Difficult to return borrowed pages to paging system.

Page 83: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 83

UCDavis, ecs150Winter 2006

Simple Power of TwosSimple Power of Twos has been used to implement malloc() and free() in the

user-level C library (libc). Uses a set of free lists with each list storing a particular

size of buffer. Buffer sizes are a power of two. Each buffer has a one word header

– when free, header stores pointer to next free list element– when allocated, header stores pointer to associated free list

(where it is returned to when freed). Alternatively, header may contain size of buffer

Page 84: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 84

UCDavis, ecs150Winter 2006 free list

One word header per buffer (pointer)– malloc(X): size = roundup(X + sizeof(header))– roundup(Y) = 2n, where 2n-1 < Y <= 2n

free(buf) must free entire buffer.

Page 85: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 85

UCDavis, ecs150Winter 2006

Extra FOUR bytes for a pointer or sizeFree next Free blockUsed where to return when free

Page 86: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 86

UCDavis, ecs150Winter 2006

Simple and reasonably fast eliminates linear searches and fragmentation.

– Bounded time for allocations when buffers are available

familiar API simple to share buffers between kernel modules

since free’ing a buffer does not require knowing its size

Page 87: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 87

UCDavis, ecs150Winter 2006

Rounding requests to power of 2 results in wasted memory and poor utilization.– aggravated by requiring buffer headers since it is not unusual

for memory requests to already be a power-of-two. no provision for coalescing free buffers since buffer

sizes are generally fixed. no provision for borrowing pages from paging system

although some implementations do this. no provision for returning unused buffers to page

allocator

Page 88: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 88

UCDavis, ecs150Winter 2006

Simple Power of Two Simple Power of Two void *malloc (size){ int ndx = 0; /* free list index */ int bufsize = 1 << MINPOWER /* size of smallest buffer */ size += 4; /* Add for header */ assert (size <= MAXBUFSIZE); while (bufsize < size) { ndx++; bufsize <<= 1; } /* ndx is the index on the freelist array from which a buffer * will be allocated */}

Page 89: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 89

UCDavis, ecs150Winter 2006 McKusick-Karels AllocatorMcKusick-Karels Allocator

/usr/src/sys/kern/kern_malloc.c/usr/src/sys/kern/kern_malloc.c

Improved power of twos implementation All buffers within a page must be of equal size Adds page usage array, kmemsizes[], to manage pages Managed Memory must be contiguous pages Does not require buffer headers to indicate page size.

When freeing memory, free(buff) simply masks of the lower order bit to get the page address (actually the page offset = pg) which is used as an index into the kmemsizes array.

Page 90: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 90

UCDavis, ecs150Winter 2006

Page 91: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 91

UCDavis, ecs150Winter 2006

28

1 page = 212 (4K) bytesSeparate 16 28-bytes blocks

Page 92: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 92

UCDavis, ecs150Winter 2006

26

1 page = 212 (4K) bytesSeparate 64 26-bytes blocks

Page 93: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 93

UCDavis, ecs150Winter 2006

On-Demand Page/kmem allocation

Page 94: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 94

UCDavis, ecs150Winter 2006

How would we know the size of this piece of memory?free(ptr);

Page 95: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 95

UCDavis, ecs150Winter 2006

How to point to the next free block?

Page 96: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 96

UCDavis, ecs150Winter 2006

Used blocks: check the page#Free blocks: pointer

Page 97: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 97

UCDavis, ecs150Winter 2006 McKusick-Karels AllocatorMcKusick-Karels Allocator

Improved power of twos implementation All buffers within a page must be of equal size Adds page usage array, kmemsizes[], to manage pages Managed Memory must be contiguous pages Does not require buffer headers to indicate page size.

When freeing memory, free(buff) simply masks of the lower order bit to get the page address (actually the page offset = pg) which is used as an index into the kmemsizes array.

Page 98: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 98

UCDavis, ecs150Winter 2006

• Disadvantages:similar drawbacks to simple power-of-twos

allocatorvulnerable to burst-usage patterns since no

provision for moving buffers between lists• Advantages:

eliminates space wastage in common case where allocation request is a power-of-two

optimizes round-up computation and eliminates it if size is known at compile time

McKusick-Karels Allocator

Page 99: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 99

UCDavis, ecs150Winter 2006

How to avoid “internal” How to avoid “internal” fragmentations?fragmentations?

struct X needs “300 bytes” Power of 2 512 bytes blocks One page can only hold 8 entities But, 4096 bytes can hold 13 entities

Page 100: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 100

UCDavis, ecs150Winter 2006

““Slab”Slab”

One or more pages for one slab One slab dedicated to ONE TYPE of

objects (with the same size)– Breaking the power-of-2 rule– Example, a 2-pages slab can hold 27 entities of

300 bytes (versus 16 entities using 512 bytes blocks).

Page 101: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 101

UCDavis, ecs150Winter 2006 Slab AllocatorSlab Allocator

Page 102: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 102

UCDavis, ecs150Winter 2006

Page 103: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 103

UCDavis, ecs150Winter 2006

The Buddy SystemThe Buddy System

Another interesting power-of-2 memory allocation used in Linux Kernel

Page 104: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 104

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

FreeIn-useallocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

Bitmap (32-Bytes chunks)

0 1023

32 64 128 256 512Free list

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 105: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 105

UCDavis, ecs150Winter 2006

0 1023

32 64 128 256 512Free list

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1023

32 64 128 256 512Free list

Page 106: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 106

UCDavis, ecs150Winter 2006

0 1023

32 64 128 256 512Free list

C D D’ B’ F F’ E’

Page 107: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 107

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A A’

Page 108: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 108

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A’B B’

Page 109: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 109

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A’B B’

Page 110: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 110

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A’B C C’

Page 111: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 111

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A’B C D D’

Page 112: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 112

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B C D D’ F E’F’

Page 113: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 113

UCDavis, ecs150Winter 2006 releasing a blockreleasing a block

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B C D D’ F E’F’

Why “SIZE”?

Page 114: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 114

UCDavis, ecs150Winter 2006 releasing a blockreleasing a block

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B C D D’ F E’F’

Page 115: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 115

UCDavis, ecs150Winter 2006 releasing a blockreleasing a block

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B C D D’ F E’F’

Page 116: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 116

UCDavis, ecs150Winter 2006 releasing a blockreleasing a block

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B C D D’ F E’F’

Page 117: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 117

UCDavis, ecs150Winter 2006 Merging free blocksMerging free blocks

Bitmap (32-Bytes chunks)

0 1023

FreeIn-use

32 64 128 256 512Free list

allocate(256), allocate(128), allocate(64),allocate(128), release(C, 128), release (D, 64)

1

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

B B’ F E’F’

Page 118: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 118

UCDavis, ecs150Winter 2006 Buddy SystemBuddy System

Allocation scheme combining Power-of-Two allocator with free buffer coalescing– binary buddy system: simplest and most popular

form. Other variants may be used by splitting buffers into four, eight or more pieces.

Approach: create small buffers by repeatedly halving a large buffer (buddy pairs) and coalescing adjacent free buffers when possible.

Requests rounded up to a power of two

Page 119: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 119

UCDavis, ecs150Winter 2006

Buddy System, ExampleBuddy System, Example Minimum allocation size = 32 Bytes Initial free memory size is 1024 use a bitmap to monitor 32 Byte chunks

– bit set if chunk is used– bit clear if chunk is free

maintain freelist for each possible buffer size – power of 2 buffer sizes from 32 to 512– sizes = {32, 64, 128, 256, 512}

Initial one block = entire buffer

Page 120: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 120

UCDavis, ecs150Winter 2006

BrainstormingBrainstorming

Pros and Cons about the Buddy System…

Page 121: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 121

UCDavis, ecs150Winter 2006

Advantages:– good job of coalesces adjacent free buffers– easy exchange of memory with paging system

can allocate new page and split as necessary when coalescing results in a complete page, it may be returned

to the paging system Disadvantage:

– performance recursive coalescing is expensive with poor worst case

performance back to back allocate and release result in alternate between

splitting and coalescing the same memory– poor programming interface:

release needs both buffer and size. entire buffer must be released

Page 122: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 122

UCDavis, ecs150Winter 2006

Page 123: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 123

UCDavis, ecs150Winter 2006

Page 124: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 124

UCDavis, ecs150Winter 2006

Page 125: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 125

UCDavis, ecs150Winter 2006

Text

InitializedData(Copy on Write)

UnintializedData(Zero-Fill)AnonymousObject

Stack(Zero-Fill)AnonymousObject

Page 126: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 126

UCDavis, ecs150Winter 2006

Page 127: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 127

UCDavis, ecs150Winter 2006

Page 128: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 128

UCDavis, ecs150Winter 2006

FORK

Page 129: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 129

UCDavis, ecs150Winter 2006 Private Mapping: Debugging

Page 130: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 130

UCDavis, ecs150Winter 2006

Paging & ReplacementPaging & Replacement

Section 5.11, Paging– Especially, Figure 5.11

Section 5.12, Page-out handling– Figure 5.14– vm_pageout_scan ( )

Page 131: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 131

UCDavis, ecs150Winter 2006

Final ReviewFinal Review OS Kernel & Architecture

– 2.1~2.5, 2.8, 3.1~3.9 Process/Thread Management

– 4.1~4.7, Priority Ceiling Protocol Memory Management

– 5.1~5.12 File System

– 6.3~6.7, 8.1~8.9

Page 132: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 132

UCDavis, ecs150Winter 2006

FinalFinal

03/22/2006 4~6 p.m. 1130 Hart 32%:

– 6 Q/A’s 4% each– 8 Multiple Choices 1% each

Page 133: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 133

UCDavis, ecs150Winter 2006 Lazy Buddy AlgorithmLazy Buddy Algorithm

(SVR4)(SVR4) Addresses the main problem with the buddy

system: poor performance due to repetitive coalescing and splitting of buffers.

Under steady state conditions, the number of in-use buffers or each size remains relatively constant. Under these condition, coalescing offers no advantage.

Coalescing is necessary only to deal with bursty conditions where there are large fluctuation in memory demand.

Page 134: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 134

UCDavis, ecs150Winter 2006

Coalescing delay – time taken to either coalesce a single buffer with its buddy or determine its buddy is not free

Coalescing is recursive and doesn’t stop until a buffer is found which can not be combined with its buddy. Each release operation results in at least one coalescing delay

Solution: – defer coalescing until it is necessary – results in

poor worst-case performance– lazy coalescing – intermediate solution

Lazy Buddy AlgorithmLazy Buddy Algorithm

Page 135: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 135

UCDavis, ecs150Winter 2006

Release operation has two setps– place buffer on free list making it locally free– coalesce with buddy making it globally free

Buffers divided into classes. – Assume N buffers in a given class. N = A + L + G, where

A = number of active buffers, L = number of locally free buffers and G = number of globally free buffers

– Buffer class states defined by slack = N – 2L - G lazy – buffer use in steady state, coalescing not necessary. slack >= 2 reclaiming – borderline consumption, coalescing needed. slack == 1 acelerated – non-steady state consumption, must coalesce faster.

slack==0

Lazy CoalescingLazy Coalescing

Page 136: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 136

UCDavis, ecs150Winter 2006 Lazy CoalescingLazy Coalescing

Improvement over basic buddy system steady state all lists in lazy state and no time

is wasted splitting and coalescing worst case algorithm limits coalescing to no

more than two buffers (two coalescing delays) shown to have an average latency 10% to

32% better than the simple buddy system greater variance and poorer worst case

performance for the release routine

Page 137: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 137

UCDavis, ecs150Winter 2006 Design of Slab AllocatorDesign of Slab Allocator

vnodecache

proccache

mbufcache

msgbcache

page-level allocator

back end

front end

vnodevnodevnodevnodeprocprocproc

mbufmbuf msgbmsgbmsgbmsgbmsgbObjects in use by the kernel

cachep = kmem_cache_create (name, size, align, ctor, dtor);

Page 138: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 138

UCDavis, ecs150Winter 2006 Slab OrganizationSlab Organization

free slabstruct

freeactive active active free

free list

Slablinked

list

Coloring area

Free list pointers32 Byte

kmem_slab

Unused space

Coloring area - vary starting offsets, optimize HW cache and bus

Page 139: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 139

UCDavis, ecs150Winter 2006

Slab Allocator Slab Allocator ImplementationImplementation

Slab = address % slab size Cache stores slabs in a partly sorted list:

fully active, partially active, free slabs. – Why?

For large objects (>page size)– management structures on separate memory

pool

Page 140: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 140

UCDavis, ecs150Winter 2006

Page 141: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 141

UCDavis, ecs150Winter 2006

Page 142: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 142

UCDavis, ecs150Winter 2006

Page 143: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 143

UCDavis, ecs150Winter 2006

Page 144: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 144

UCDavis, ecs150Winter 2006

Three Object TypesThree Object Types

Named Objects Anonymous Objects Shadow Objects

Page 145: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 145

UCDavis, ecs150Winter 2006

Where Pages Come From??Where Pages Come From??

textdataBSS

user stackargs/env

kernel

data

file volumewith

executable programs

Fetches for clean text or data are typically fill-from-file.

Modified (dirty) pages are pushed to backing store (swap) on eviction.

Paged-out pages are fetched from backing store when needed.

Initial references to user stack and BSS are satisfied by zero-fill on demand.

Page 146: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 146

UCDavis, ecs150Winter 2006

VM Internals: Mach/BSDVM Internals: Mach/BSD

start, len,prot

start, len,prot

start, len,prot

start, len,prot

addressspace (task)

vm_maplookupenter

pmap

page table

system-widephys-virtual map

pmap_enter()pmap_remove()

pmap_page_protectpmap_clear_modifypmap_is_modifiedpmap_is_referencedpmap_clear_reference

putpagegetpage

memoryobjects

One pmap (physical map)per virtual address space.

page cells (vm_page_t)array indexed by PFN

Page 147: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 147

UCDavis, ecs150Winter 2006 Memory ObjectsMemory Objects

anonymous VM

object->putpage(page)object->getpage(offset, page, mode)

memory object

swappager

vnodepager

externpager

mapped files

DSM databasesreliable VM etc.

Page 148: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 148

UCDavis, ecs150Winter 2006

Page 149: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 149

UCDavis, ecs150Winter 2006

open; read;read;…;write;read;close

Page 150: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 150

UCDavis, ecs150Winter 2006 Memory-Mapped FilesMemory-Mapped Files

With appropriate support, virtual memory is a useful basis for accessing file storage (vnodes).– bind file to a region of virtual memory with mmap syscall.

e.g., start address x virtual address x+n maps to offset n of the file

– several advantages over stream file access uniform access for files and memory (just use pointers) performance: zero-copy reads and writes for low-overhead I/O but: program has less control over data movement style does not generalize to pipes, sockets, terminal I/O, etc.

Page 151: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 151

UCDavis, ecs150Winter 2006

Memory Mapped FileMemory Mapped File

sections

textdataidata

wdata

header

symboltable

relocationrecords

textdataidata

wdata

header

symboltable

relocationrecords

BSS

user stackargs/env

kernel u-area

textdata

textdata

executableimage

library (DLL)

loader

segments

Memory-mapped files are used internallyfor demand-paged text and initialized static data.

BSS and user stack are“anonymous” segments.1. no name outside the process2. not sharable3. destroyed on process exit

Page 152: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 152

UCDavis, ecs150Winter 2006

MemoryAddresses

pager

Page 153: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 153

UCDavis, ecs150Winter 2006

vm_object

VM fault

mmapmsync

getpage

file syscall

vnode

getpagevhold/vrele

read/writefsync, etc.

UFS NFS

putpage

The Block/Page I/OThe Block/Page I/OThe VFS/memory object/pmap framework reduces VM and file access to the central issue:

How does the system handle a stream of get/put block/page operations on a collection of vnodes and memory objects?- executable files- data files- anonymous paging files (swap files)- reads on demand from file syscalls - reads on demand from VM page faults- writes on demand

To deliver good performance, we must manage system memory as an I/O cacheof pages and blocks.

Page 154: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 154

UCDavis, ecs150Winter 2006

VM: 2 other issuesVM: 2 other issues

Virtual Address Access Overhead The size of the page table

Page 155: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 155

UCDavis, ecs150Winter 2006Translation Lookaside Buffer Translation Lookaside Buffer

(TLB)(TLB) An on-chip hardware translation buffer (TB or TLB)

caches recently used virtual-physical translations (ptes). A CPU pipeline stage probes the TLB to complete over

99% of address translations in a single cycle. Like other memory system caches, replacement of TLB

entries is simple and controlled by hardware. e.g., Not Last Used

If a translation misses in the TLB, the entry must be fetched by accessing the page table(s) in memory.

cost: 10-200 cycles

Page 156: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 156

UCDavis, ecs150Winter 2006

MMU and TLBMMU and TLB

Control

Memory

TLB

CPU

MMU

Page 157: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 157

UCDavis, ecs150Winter 2006

Associative Memory:– expensive, but fast -- parallel searching

TLB: select a small number of page table entries and store them in TLB

virt-page modified protectionpage frame140 1 RW 3120 0 RX 38

130 1 RW 29129 1 RW 62

Page 158: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 158

UCDavis, ecs150Winter 2006

Completing a VM ReferenceCompleting a VM Reference

raiseexception

probepage table

loadTLB

probe TLB

accessphysicalmemory

accessvalid?

pagefault?

signalprocess

allocateframe

page ondisk?

fetchfrom disk

zero-fillloadTLB

starthere

MMU

OS

Page 159: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 159

UCDavis, ecs150Winter 2006

Care and Feeding of TLBsCare and Feeding of TLBs The OS kernel carries out its memory management

functions by issuing privileged operations on the MMU. Choice 1: OS maintains page tables examined by the

MMU.– MMU loads TLB autonomously on each TLB miss– page table format is defined by the architecture– OS loads page table bases and lengths into privileged memory

management registers on each context switch. Choice 2: OS controls the TLB directly.

– MMU raises exception if the needed pte is not in the TLB.– Exception handler loads the missing pte by reading data structures

in memory (software-loaded TLB).

Page 160: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 160

UCDavis, ecs150Winter 2006

Demand PagingDemand Paging

OS may leave some virtual-physical translations unspecified. mark the pte for a virtual page as invalid

If an unmapped page is referenced, the machine passes control to the kernel exception handler (page fault).

passes faulting virtual address and attempted access mode

Handler initializes a page frame, updates pte, and restarts. If a disk access is required, the OS may switch to another process after

initiating the I/O. Page faults are delivered at IPL 0, just like a system call trap. Fault handler executes in context of faulted process, blocks on a

semaphore or condition variable awaiting I/O completion.

Page 161: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 161

UCDavis, ecs150Winter 2006

The OS tries to minimize page fault costs incurred by all processes, balancing fairness, system throughput, etc.(1) fetch policy: When are pages brought into memory?

prepaging: reduce page faults by bring pages in before needed clustering: reduce seeks on backing storage

(2) replacement policy: How and when does the system select victim pages to be evicted/discarded from memory?

(3) backing storage policy: Where does the system store evicted pages? When is the backing storage allocated? When does the system write modified pages to backing store?

Page 162: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 162

UCDavis, ecs150Winter 2006

Page ColoringPage Coloring Direct-mapped Cache

Page 163: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 163

UCDavis, ecs150Winter 2006Binding of Instructions and Data to MemoryBinding of Instructions and Data to Memory

Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes.

Load time: Must generate relocatable code if memory location is not known at compile time.

Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers).

Address binding of instructions and data to memory addresses can happen at three different stages:

Page 164: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 164

UCDavis, ecs150Winter 2006

Memory ProtectionMemory Protection Memory protection implemented by associating

protection bit with each frame.

Valid-invalid bit attached to each entry in the page table:– “valid” indicates that the associated page is in the

process’ logical address space, and is thus a legal page.– “invalid” indicates that the page is not in the process’

logical address space.

Page 165: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 165

UCDavis, ecs150Winter 2006 Working-Set ModelWorking-Set Model

working-set window a fixed number of page references Example: 10,000 instruction

WSSi (working set of Process Pi) =total number of pages referenced in the most recent (varies in time)– if too small will not encompass entire locality.– if too large will encompass several localities.– if = will encompass entire program.

D = WSSi total demand frames if D > m Thrashing Policy if D > m, then suspend one of the processes.

Page 166: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 166

UCDavis, ecs150Winter 2006

Working-set modelWorking-set model

Page 167: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 167

UCDavis, ecs150Winter 2006

Keeping Track of the Keeping Track of the Working SetWorking Set

Approximate with interval timer + a reference bit Example: = 10,000

– Timer interrupts after every 5000 time units.– Keep in memory 2 bits for each page.– Whenever a timer interrupts copy and sets the values of

all reference bits to 0.– If one of the bits in memory = 1 page in working set.

Why is this not completely accurate? Improvement = 10 bits and interrupt every 1000

time units.

Page 168: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 168

UCDavis, ecs150Winter 2006

Page-Fault Frequency SchemePage-Fault Frequency Scheme

Establish “acceptable” page-fault rate.– If actual rate too low, process loses frame.– If actual rate too high, process gains frame.

Page 169: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 169

UCDavis, ecs150Winter 2006 2-level Page Table2-level Page Table

pt1 pt2 offset10 10 12

210

210

1 MB 2 KBiPT 2 KB

Page 170: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 170

UCDavis, ecs150Winter 2006

Paging and SegmentationPaging and Segmentation Paging does the following:

– Each process gets its own virtual memory (4GB)– Each virtual memory looks like a linear array of bytes, with addresses

starting at zero.– It makes the memory look bigger.

Segmentation – Allows each process to have multiple ``simulated memories.'‘

(S*4) GB – Each of these memories, or segments, starts at address zero, is

independently protected, and can be separately paged. – Memory address has two parts: a segment number and a segment offset.

Page 171: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 171

UCDavis, ecs150Winter 2006

Paging and SegmentationPaging and Segmentation

Example: UNIX– Text segment holds the executable code of the process.

read-only fixed in size

– Data segment holds the memory used for global variables. read/write Not shared

– Stack segment. process' stack read/write Not shared

Page 172: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 172

UCDavis, ecs150Winter 2006

SegmentationSegmentation Memory-management scheme that supports user view of memory. A program is a collection of segments. A segment is a logical unit

such as:main program,procedure, function,method,object,local variables, global variables,common block,stack,symbol table, arrays

Page 173: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 173

UCDavis, ecs150Winter 2006 User’s View of a ProgramUser’s View of a Program

Page 174: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 174

UCDavis, ecs150Winter 2006 Logical View of SegmentationLogical View of Segmentation

1

3

2

4

1

4

2

3

user space physical memory space

Page 175: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 175

UCDavis, ecs150Winter 2006 Segmentation Architecture Segmentation Architecture

Logical address consists of a two tuple:<segment-number, offset>,

Segment table – maps two-dimensional physical addresses; each table entry has:– base – contains the starting physical address where the segments reside in

memory.– limit – specifies the length of the segment.

Segment-table base register (STBR) points to the segment table’s location in memory.

Segment-table length register (STLR) indicates number of segments used by a program; segment number s is legal if s < STLR.

Page 176: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 176

UCDavis, ecs150Winter 2006

Pentium Memory AddressingPentium Memory Addressing

segmentTable pageDir pageTable physicalMM

Page 177: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 177

UCDavis, ecs150Winter 2006

Page 178: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 178

UCDavis, ecs150Winter 2006

Four Memory ModeFour Memory Mode

Un-segmented and un-paged Un-segmented and paged Segmented and un-paged Segmented and paged

Page 179: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 179

UCDavis, ecs150Winter 2006 VM with many segmentsVM with many segments

MAPPINGin MMU

Page 180: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 180

UCDavis, ecs150Winter 2006

Page 181: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 181

UCDavis, ecs150Winter 2006

OS

process 5

process 8

process 2

OS

process 5

process 2

OS

process 5

process 2

OS

process 5

process 9

process 2

process 9

process 10

Page 182: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 182

UCDavis, ecs150Winter 2006

Page 183: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 183

UCDavis, ecs150Winter 2006

Page 184: ecs150 Spring 2006 : Operating System #4: Memory Management (chapter 5)

05/01/2006 ecs150, Spring 2006 184

UCDavis, ecs150Winter 2006