View
215
Download
1
Embed Size (px)
Citation preview
2
Mini-Gedankenexperimenten
What’s the refresh rate of your monitor? What is the access time of a hard drive? What response time determines
sluggishness or speediness? What’s the relation?
What determines the running speed of a program that’s paging heavily?
If you have a program that pages heavily, what are your options to improve the situation?
3
Mechanics
Let’s finish off last lecture Memory mapping, Unified VM next
time No assigned reading yet, may not exist
Mid-term on track Covers everything before it
Open Q&A session? Is there interest? If so, when?
4
Where We Left Off Last Time
Various approaches to evicting pages Some discussion about why doing
even “well” is hard to implement Belady’s algorithm for off-line
analysis We just finished variations on FIFO
In particular, enhanced FIFO with 2nd chance
5
Lessons From Enhanced FIFO
Observation: it’s easier to evict a clean page than a dirty page
2nd observation: sometimes the disk and CPU are idle
Optimization: when system’s free, write dirty pages back to disk, but don’t evict
Called flushing – often falls to pager daemon
6
Least Recently Used (LRU)
Algorithm Replace page that hasn’t been used
for the longest time Question
What hardware mechanisms required to implement LRU?
7
Implementing LRU
Perfect Use a timestamp on each reference Keep a list of pages ordered by time of
reference
5 3 4 7 9 11 2 1 15
Mostly recently used
Leastrecently used
8
Approximate LRU
Most recently used Least recently used
N categories
pages in order of last reference
LRU
CrudeLRU
2 categories
pages referenced since the last page fault
pages not referenced since the last page fault
. . . 2552540 1 2 38-bitcount
256 categories
9
Aging: Not Frequently Used (NFU)
Algorithm Shift reference bits into counters Pick the page with the smallest counter
Main difference between NFU and LRU? NFU has a short history (counter length)
How many bits are enough? In practice 8 bits are quite good
Pros: Require one reference bit Cons: Require looking at all counters
00000000
00000000
10000000
00000000
10000000
00000000
11000000
00000000
01000000
10000000
11100000
00000000
10100000
01000000
01110000
10000000
01010000
10100000
00111000
01000000
10
Where Do We Get Storage?
32 bit VA to 32 bit PA – no space, right? Offset within page is the same
No need to store offset 4KB page = 12 bits of offset Those 12 bits are “free” in PTE
Page # + other info <= 32 bits Makes storing info easy
11
x86 Page Table Entry
Valid
Writable
Owner (user/kernel)
Write-through
Cache disabled
Accessed (referenced)
Dirty
PDE maps 4MB
Global
Page frame number DLGlCwPU A Cd Wt O W V
Reserved31 12
12
What Happens on Diagonal Lines
My screen is 1024*768 pixels 256 colors = 1 byte per pixel = .75MB 64K colors = 2 bytes/pixel = 1.5MB Page size is 4KB Screen is 192 or 384 pages
1 page = several horizontal lines Diagonal/vertical lines = TLB
badness “Superpages” to the rescue
13
The Big Picture
We’ve talked about single evictions Most computers are
multiprogrammed Single eviction decision still needed New concern – allocating resources How to be “fair enough” and achieve
good overall throughput This is a competitive world – local and
global resource allocation decisions
14
Program Behaviors
80/20 rule > 80% memory
references are made by < 20% of code
Locality Spatial and temporal
Working set Keep a set of pages in
memory would avoid a lot of page faults
# pages in memory#
page
fau
lts
Working set
15
Observations re Working Set
Working set isn’t static There often isn’t a single “working
set” Multiple plateaus in previous curve Program coding style affects working set
Working set is hard to gauge What’s the working set of an interactive
program?
16
Working Set
Main idea Keep the working set in memory
An algorithm On a page fault, scan through all pages of the
process If the reference bit is 1, record the current time for
the page If the reference bit is 0, check the “last use time”
If the page has not been used within , replace the page Otherwise, go to the next
Add the faulting page to the working set
17
WSClock Paging Algorithm
Follow the clock hand If the reference bit is 1, set reference bit to 0,
set the current time for the page and go to the next
If the reference bit is 0, check “last use time” If page has been used within , go to the next If page hasn’t been used within and modify bit is 1
Schedule the page for page out and go to the next If page hasn’t been used within and modified bit is 0
Replace this page
18
Simulating Modify Bit with Access Bits
Set pages read-only if they are read-write
Use a reserved bit to remember if the page is really read-only
On a read fault If it is not really read-only, then record a
modify in the data structure and change it to read-write
Restart the instruction
19
Implementing LRU without Reference Bit
Some machines have no reference bit VAX, for example
Use the valid bit or access bit to simulate Invalidate all valid bits (even they are valid) Use a reserved bit to remember if a page is
really valid On a page fault
If it is a valid reference, set the valid bit and place the page in the LRU list
If it is a invalid reference, do the page replacement Restart the faulting instruction
20
Demand Paging
Pure demand paging relies only on faults to bring in pages
Problems? Possibly lots of faults at startup Ignores spatial locality
Remedies Loading groups of pages per fault Prefetching/preloading
21
Speed and Sluggishness
Slow is > .1 seconds (100 ms) Speedy is << .1 seconds Monitors tend to be 60+ Hz =
<16.7ms between screen paints Disks have seek + rotational delay
Seek is somewhere between 7-16 ms At 7200rpm, one rotation = 1/120 sec =
8ms. Half-rotation is 4ms Conclusion? One disk access OK, six are
bad
22
Disk Address
Use physical memory as a cache for disk
Where to find a page on a page fault? PPage# field
is a disk address
Virtualaddress
spaceinvalid
Physicalmemory
23
Imagine a Global LRU
Global – across all processes Idea – when a page is needed, pick
the oldest page in the system Problems? Process mixes?
Interactive processes Active large-memory sweep processes
Mitigating damage?
24
Amdahl’s Law
Gene Amdahl (IBM, then Amdahl) Noticed the bottlenecks to speedup Assume speedup affects one
component New time =
(1-not affected) + affected/speedup In other words, diminishing returns
25
NT x86 Virtual Address Space Layouts
00000000
7FFFFFFF80000000
System cachePaged pool
Nonpaged pool
Kernel & execHAL
Boot drivers
Process page tablesHyperspace
Application codeGlobals
Per-thread stacksDLL code
3-GB user space
1-GB system space
BFFFFFFFC0000000
FFFFFFFF FFFFFFFF
C0000000
C0800000
26
Virtual Address Space in Win95 and Win98
00000000
7FFFFFFF80000000
Operating system(Ring 0 components)
Shared, process-writable(DLLs, shared memory,
Win16 applications)
Win95 and Win98
User accessible
FFFFFFFF
C0000000
Unique per process(per application),user mode
Systemwideuser mode
Systemwidekernel mode
27
Details with VM Management
Create a process’s virtual address space Allocate page table entries (reserve in NT) Allocate backing store space (commit in NT) Put related info into PCB
Destroy a virtual address space Deallocate all disk pages (decommit in NT) Deallocate all page table entries (release in
NT) Deallocate all page frames
28
Page States (NT)
Active: Part of a working set and a PTE points to it Transition: I/O in progress (not in any working sets) Standby: Was in a working set, but removed.
A PTE points to it, not modified and invalid.
Modified: Was in a working set, but removed. A PTE points to it, modified and invalid.
Modified no write: Same as modified but no write back
Free: Free with non-zero content Zeroed: Free with zero content Bad: hardware errors
29
Working setreplacement
Page in or allocationDemandzero fault
Dynamics in NT VM
Processworking
set
Standbylist
Modifiedlist
Modifiedwriter
“Soft”faults
Freelist
Zerothread
Zerolist
Badlist
30
Shared Memory
How to destroy a virtual address space? Link all PTEs Reference count
How to swap out/in? Link all PTEs Operation on all entries
How to pin/unpin? Link all PTEs Reference count
.
.
.
.
.
.
...
.
.
.
Process 1
Process 2
w
...
w
Page table
Page table
Physicalpages
31
.
.
.
.
.
.
...
.
.
.
Copy-On-Write
Child’s virtual address space uses the same page mapping as parent’s
Make all pages read-only Make child process
ready On a read, nothing
happens On a write, generates an
access fault map to a new page
frame copy the page over restart the instruction
Parent process
Child process
rr
...
rr
Page table
Page table
Physicalpages