52
COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 8 Topics in Page-based Memory Management ©Copyright 2008 Umakishore Ramachandran and William D. Leahy Jr.

Computer Systems An Integrated Approach to Architecture and Operating Systems

  • Upload
    foster

  • View
    68

  • Download
    2

Embed Size (px)

DESCRIPTION

Computer Systems An Integrated Approach to Architecture and Operating Systems. Chapter 8 Topics in Page-based Memory Management. ©Copyright 2008 Umakishore Ramachandran and William D. Leahy Jr. 8.1 Demand Paging. Paging as described in Chapter 7 implied whole program was in memory - PowerPoint PPT Presentation

Citation preview

Page 1: Computer Systems An Integrated Approach to Architecture and Operating Systems

COMPUTER SYSTEMSAn Integrated Approach to Architecture and Operating Systems

Chapter 8Topics in Page-based Memory

Management

©Copyright 2008 Umakishore Ramachandran and William D. Leahy Jr.

Page 2: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1 Demand Paging• Paging as described in Chapter 7 implied whole program was in

memory• But does it have to be? • On average

– 30% of a programs memory footprint is the primary logic of program– 70% is little used error handling

• Therefore, prudent for memory manager not to load entire program into memory on startup.

• Basic idea is to load parts of the program that are not in memory on demand.

• This technique, referred to as demand paging results in better memory utilization.

Page 3: Computer Systems An Integrated Approach to Architecture and Operating Systems

What would be the main advantage of demand paging?

Page 4: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.1 Hardware for demand paging

0x00230x01240x11110x3F04

0x00000x00440x0068

0x00x10x20x3

0x3E00x3E10x3E2

003E1 234 2340044

v

PTBR

CPU

Memory

Page 5: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.1 Hardware for demand paging

• If I5 page faults…handle

• If I2 page faults

• Let I1 complete and squash I3-I5 before INT.

• INT needs to save PC corresponding to I2 for re-starting I2 after servicing page fault.

• Note that there is no harm in squashing instructions I3-I5 since they have not modified permanent state of program

IF ID/RR EX MEM WBInstruction in

Instruction out

BUFFER

BUFFER

BUFFER

BUFFER

Potential page faults

I1I2I3I4I5

Page 6: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.2 Page fault handler1. Find a free page frame2. Load the faulting virtual page from the disk

into the free page frame3. Update the page table for the faulting

process4. Place the PCB of the process back in the

ready queue of the scheduler

Page 7: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.3 Data structures for Demand-paged Memory Management

• Free-list of page frames• Frame table• Disk map

Page 8: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.3 Data structures for Demand-paged Memory Management

• Free-list of page frames

…Pframe 52 Pframe 20 Pframe 8Free-list

Pframe 200

Page 9: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.3 Data structures for Demand-paged Memory Management

• Frame table <P2, 20> free <P5, 15> <P1, 32> free <P3, 0> <P4, 0> free

<PID, VPN>Pframe

0

1 2 3 4 5 6 7

Note: There are

different ways

of implementing

this.

Page 10: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.3 Data structures for Demand-paged Memory Management

• Disk map

Swap space

P1 …..

P2 Pn

disk address

VPN

0

1 2 3 4 5 6 7

Disk map for P1

disk address disk address disk address disk address disk address disk address disk address

Page 11: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.1.4 Anatomy of a Page Fault• Find a free page frame• Pick victim page and evict• Load faulting page• Update page table for faulting process and

frame table• Restart faulting process

Page 12: Computer Systems An Integrated Approach to Architecture and Operating Systems

Eviction

• When evicting a page we must consider its status– Clean: the page has not been written to and thus

matches its counterpart on disk– Dirty: the page has been written to and no longer

matches what is on disk• Clean pages may be simply evicted• Dirty pages must be written back to disk

Page 13: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.2 Interaction between the Process Scheduler and Memory Manager

CPU

Hardware

CPU scheduler

…PCB1 PCB2

ready_q

Kernel

Process 1 Process 2 Process n ……….

User level

Memory Manager

Timer interrupt Page fault

Process dispatch

…freelist Pframe Pframe

PT1

PT2

.

.

DM1

DM2

.

.FT

(1) (2)

Page 14: Computer Systems An Integrated Approach to Architecture and Operating Systems

CPU

Hardware

CPU scheduler

…PCB1 PCB2

ready_q

Kernel

Process 1 Process 2 Process n ……….

User level

Memory Manager

Timer interrupt Page fault

Process dispatch

…freelist Pframe Pframe

PT1

PT2

.

.

DM1

DM2

.

.FT

(1) (2)

• CPU scheduler dispatches process, it runs until one of following happens1. HW timer interrupts CPU causing upcall (1) to CPU scheduler that may result in a process switch. CPU scheduler takes

appropriate action to schedule next process on CPU.2. Process incurs a page fault resulting in an upcall (2) to memory manager that results in page fault handling3. Process makes system call resulting in another subsystem (not shown) getting an upcall

Page 15: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3 Page Replacement Policies• How to pick victim page to evict from physical memory

when page fault & free-list is empty. • For a given string of page references, policy should result

in least number of page faults. – This attribute ensures that the amount of time spent in OS

dealing with page faults is minimized.• Ideally, once a particular page has been brought into

physical memory, policy should not incur a page fault for same page again. – This attribute ensures that page fault handler attempts to

respect reference pattern of the user programs.

Page 16: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3 Page Replacement Policies• May use

– Local victim selection• Simple• Don't need frame table• Poor memory utilization

– Global victim selection• Better memory utilization• The norm

• Ideally there are no page faults and memory manager never runs.

• Goal is to minimize (or eliminate) page faults

Page 17: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.1 Belady’s Min• In 1966 Laszlo Belady proposed an optimal

page replacement algorithm requiring to know in advance the page replacement string

• Obviously this is impossible• But the performance level of Belady's Min

may be used as a reference standard to compare to other policies performance

Page 18: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.2 First In First Out (FIFO)• Affix a timestamp when a

page is brought in to physical memory

• If a page has to be replaced, choose the longest resident page as the victim

• No special hardware needed

• Queue length is number of physical frames

<PID, VPN>

Circular queue

<PID, VPN> …..

<PID, VPN>

Head

Tail free

Full

Page 19: Computer Systems An Integrated Approach to Architecture and Operating Systems

FIFO

• Maintain queue. As page is read in enqueue. Use head of queue as frame to replace

• Sample 1,2,3,4,1,2,5,1,2,3,4,5

Page 20: Computer Systems An Integrated Approach to Architecture and Operating Systems

FIFO

1 1 3 3 1 1 5 5 2 2 4 4

2 2 4 4 2 2 1 1 3 3 5

1 1 1 4 4 4 5 5 5 5 5 5

2 2 2 1 1 1 1 1 3 3 3

3 3 3 2 2 2 2 2 4 4

1 1 1 1 1 1 5 5 5 5 4 4

2 2 2 2 2 2 1 1 1 1 5

3 3 3 3 3 3 2 2 2 2

4 4 4 4 4 4 3 3 3

1 2 3 4 1 2 5 1 2 3 4 5 12

12

9

10

Belady’sAnomaly

Time

Page 21: Computer Systems An Integrated Approach to Architecture and Operating Systems

FIFO1 2 3 4 1 2 5 1 2 3 4 5

5 5 5 5 5 5

1 1 1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4 4

5

Time

Page 22: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.3 Least Recently Used (LRU)• LRU policy makes

assumption that if a page has not been referenced in a long time there is a good chance it will not be referenced in the future as well.

• Thus, victim page in LRU policy is page that has not been used for longest time.

<PID, VPN>

Push down stack

<PID, VPN>

…..

<PID, VPN>

Top

Bottom free

Page 23: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.3 Least Recently Used (LRU)1 2 3 4 1 2 5 1 2 3 4 5

TIME

PhysicalFrames

PushDownStack

Page 24: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.3 Least Recently Used (LRU)

• LRU is appealing but actually not feasible– Stack has as many entries as number of physical

frames. For a physical memory of 64 MB & an 8 KB pagesize, size of stack: 8 KB. Too big in datapath!

– On every access, hardware has to modify stack to place current reference on top of stack. Too slow.

• LRU may be bad choice in certain situations– e.g. Access N+1 pages in a processor with N

frames available

Page 25: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.3.1 Approximate LRU: A Small Hardware Stack

• Add a hardware stack with ~16 entries• Push references onto stack

– If they are already in stack bring to top– Bottom reference falls out of stack

• When free frame needed randomly select one not in stack

• Shown to be successful in some applications• Probably not fast enough for high speed pipelined

processor

Page 26: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.3.2 Approximate LRU:Reference bit per page frame

• Associate a bit with each frame.– hardware sets on reference– software reads and clears

• Have an n-bit counter register for each frame• Periodically (daemon) right shifts all counters

and puts reference bit into high order bit• Highest value counters are recently used

frames; lowest value counters are LRU frames

Page 27: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.4 Second chance page replacement algorithm

• Initially, OS clears reference bits of all frames. As program executes, hardware sets reference bits for pages referenced by program.

• If a page has to be replaced, memory manager chooses replacement candidate in FIFO manner.

• If chosen victim’s reference bit is set, then manager clears reference bit and this page is moved to end of FIFO queue.

• The victim is first candidate in FIFO order whose reference bit is not set.

Page 28: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.5 Review of page replacement algorithms

PAGE REPLACEMENTALGORITHM

HARDWARE ASSIST NEEDED

COMMENTS

FIFO None Could lead to anomalous behavior

Belady’s MIN Oracle Provably optimal performance; not realizable in hardware; useful as a standard for performance comparison

True LRU Push down stack Expected performance close to optimal; infeasible for hardware implementation due to space and time complexity; worst-case performance may be similar or even worse compared to FIFO

Approximate LRU #1 A small hardware stack Expected performance close to optimal; worst-case performance may be similar or even worse compared to FIFO

Approximate LRU #2 Reference bit per page Expected performance close to optimal; moderate hardware complexity; worst-case performance may be similar or even worse compared to FIFO

Second Chance Replacement

Reference bit per page Expected performance better than FIFO; memory manager implementation simplified compared to LRU schemes

Page 29: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.6 Optimizing Memory Management

• Beyond basic techniques presented additiona optimizations are possible

• These optimizations are on top of the already presented techniques

Page 30: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.7 Pool of free page frames• Instead of waiting for free page count to = 0• Periodically run daemon to evict pages

keeping a pool of n free pages

Page 31: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.7.1 Overlapping I/O with Processing

• Upon eviction we add the evicted frame to the free list

• If the frame was dirty it is scheduled for write back

• When a frame is needed only clean frames are selected skipping over dirty frames still awaiting write back

Page 32: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.7.2 Reverse Mapping to Page Tables

• When daemon runs to maintain free list at a certain level it will take pages from processes that might turn around and page fault on those pages

• If we maintain additional info in free list we can know this and give page back to process (since the data is still intact)

freelistPframe 52

Dirty

Pframe 22

Clean

Pframe 200

Clean

….

<PID, VPN> <PID, VPN> <PID, VPN>

Page 33: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.8 Thrashing• Suppose many processes are in memory but

CPU utilization is low. What could cause this?1. Too many I/O bound processes?2. Too many CPU bound processes?

• Should we add more processes into memory?

Page 34: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.8 Thrashing• If some processes don't really have enough

pages to support their current configuration they will constantly be page faulting and trying to grab frames from other processes

• This may lead those processes to also start page faulting and grabbing frames

• Does this sound like the ideal place to introduce more processes into memory?

Page 35: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.8 Thrashing• Controlling thrashing starts with understanding

temporal locality• Temporal locality is the tendency for the same

memory location to be accessed over a short period of time

• During a given time period t certain pages will be accessed others will not

• If during t the pages that need to be accessed are in memory then no page faults will occur

Page 36: Computer Systems An Integrated Approach to Architecture and Operating Systems

1 2 3 4 51 2 3 4 55 1 2 3 4 5

1

2

3

4

5 5 5 3 3 3

1 1 1 4 4

2 2 2 5

1 1 1 4 4

4 2

5 5

4 2 2 2 5

2 3

5 5

2 2 5 5 5 3 3 3 2 2

3 3 3 1 1 1 4 4 4 3

5

Frames

Pages

MemoryReference

Page 37: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.9 Working set• Working set is the set of pages that defines the

locus of activity of a program• The working set size (WSS) denotes the number

of distinct pages touched by a process in a window of time.

• The total memory pressure (TMP) exerted on the system is the summation of the WSS of all the processes currently competing for resources.

Page 38: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.3.10 Controlling thrashing1. If TMP > Physical Memory

– Decrease the degree of multiprogramming.– Else: Increase

# physical frames per process

Page fault rate

High water mark

Low water mark

2. Monitor Page fault rate• if pfr>High

• Decrease progs• if pfr<Low

• Increase progs

Page 39: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.4 Other considerations• Prepaging after swapping out

– Bring entire WS back when swapping back in• Memory manager and I/O system must work

together– Suppose I/O system is writing data into page that is

being swapped out by memory manager!– Solution: Pining– What about pages where memory manager lives?– They are pined

Page 40: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.5 Speeding up Address Translation

• We will do whatever we can to reduce page fault rate– Context switch time < 100 instructions– Page fault time without disk I/O < 100 instructions– Disk I/O to handle page fault ~1,000,000 instructions

• Nevertheless there remains a big penalty. Each processor memory reference actually requires two memory accesses!– One to page table– One to actual location desired

Page 41: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.5 Speeding up Address Translation

• Consider spatial locality which is the tendency to access nearby locations over a short period of time

• If we access address 0x12345678 how likely is it that we will access 0x1234567C?

• What frame will we access in each case?• Recall

– 0x12345678– 0x1234567C

Page 42: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.5.1 Address Translation with TLB

• Solution: Install a hardware device which stores recently accessed page table info

• Divided into two sections one for user entries, the other for kernel entries– On context switch user portion may be flushed

quickly with a special kernel-mode instruction

Page 43: Computer Systems An Integrated Approach to Architecture and Operating Systems

CPU Page Offset

Page Frame

Page Frame

Page Frame

Page Frame

Page Frame

Page Frame

PageTable

PhysicalMemory

Frame Offset

TLB

Page 44: Computer Systems An Integrated Approach to Architecture and Operating Systems

CPU Page Offset

Page Frame

Page Frame

Page Frame

Page Frame

Page Frame

Page Frame

PageTable

PhysicalMemory

Frame Offset

TLB

TLB Miss

Page 45: Computer Systems An Integrated Approach to Architecture and Operating Systems

CPU Page Offset

Page Frame

Page Frame

Page Frame

Page FramePage Frame

Page Frame

PageTable

PhysicalMemory

Frame Offset

TLB

TLB Hit

Page 46: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.6 Advanced topics in memory management

• How big is a page table?• Assume a 32 bit address and a page size of

4kB• Assume each page table entry is 32 bits long

• And each process has a page table!• How can we make this work?

Page 47: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.6.1 Multi-level Page Tables• To deal with excessively large page tables we

can page the page table...

0000 0000 0000 0000 0000 00000000 0000

Page 1 Page 2 Offset

Page ofPageTable

PhysicalMemory

OuterPageTable

0

10230

1023

PTBR

Page 48: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.6.2 Sophisticated use of the page table entry

• Page tables may contain more than just a physical frame number

• They may also contain mode information– RW: Read Write– RO: Read Only– CW: Copy on Write

• Useful for items such as static-read-only areas of memory and process forking

Page 49: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.6.3 Inverted page tables• Virtual memory is usually much larger than physical memory• Some architectures (e.g. IBM Power processors) use an inverted page

table, essentially a frame table. • The inverted page table alleviates the need for a per-process page table. • Size of table is equal to size of physical memory (in frames) rather than

virtual memory. • Unfortunately, inverted page tables complicate logical to physical

address translation done by hardware. – hardware handles address translations through TLB mechanism. – On TLB miss, hardware hands over control (through a trap) to OS to resolve

translation in software. OS is responsible for updating TLB as well. – Architecture usually provides special instructions for reading, writing, and

purging TLB entries in privileged mode.

Page 50: Computer Systems An Integrated Approach to Architecture and Operating Systems

8.7 Summary Topics• Demand paging basics including hardware support, and data structures in OS

for demand-paging• Interaction between CPU scheduler and memory manager in dealing with page

faults• Page replacement policies including FIFO, LRU, and second chance replacement• Techniques for reducing penalty for page faults including keeping a pool of page

frames ready for allocation on page faults, performing any necessary writes of replaced pages to disk lazily, and reverse mapping replaced page frames to the displaced pages

• Thrashing and the use of working set of a process for controlling thrashing• Translation look-aside buffer for speeding up address translation to keep the

pipelined processor humming along• Advanced topics in memory management including multi-level page tables, and

inverted page tables

Page 51: Computer Systems An Integrated Approach to Architecture and Operating Systems

Questions?

Page 52: Computer Systems An Integrated Approach to Architecture and Operating Systems