113
Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

Embed Size (px)

Citation preview

Page 1: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

1

Chapter 9, Virtual MemoryOverheads, Part 2Sections 9.6-9.10

Page 2: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

2

9.6 Thrashing

• As noted earlier, in order for virtual memory to be practical, extremely low fault rates are necessary

• If the number of frames allocated to a process falls below a certain level, the process will tend to fault frequently

• This leads to bad performance (i.e., the process will run slowly)

Page 3: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

3

• Thrashing is the term used to describe the situation when the page fault rate is too high

• Thrashing can be loosely defined as follows: A process is thrashing if it’s spending more time paging than it is executing

• Note that due to the relative amount of time needed to access secondary storage compared to the size of a time slice, for example, it doesn’t take much paging before the time spent paging exceeds the time spent executing

Page 4: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

4

• It was pointed out earlier that there is an absolute minimum number of frames that a process would need in order to execute

• If the number of frames fell below that threshold, then the process would have to be swapped out

• The number of frames needed to prevent thrashing would be (considerably) higher than the absolute minimum

• It is desirable to find ways to avoid thrashing short of simply swapping jobs out

Page 5: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

5

• The cause of thrashing• The book illustrates the idea with an example

that occurred in an early system before this behavior was well understood

• 1. Let the O/S scheduling subsystem monitor CPU utilization, and when it’s low, schedule additional jobs

Page 6: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

6

• 2. Independently of scheduling, let global page replacement be done

• Suppose any one job enters a heavy paging cycle

• It will steal pages from other jobs• When the other jobs are scheduled, they end

up paging in order to steal memory back

Page 7: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

7

• As more processes spend time paging, CPU utilization goes down

• The scheduler sees this and schedules additional jobs• Adding more jobs increases the problem• It’s a vicious cycle• (It’s a feedback loop where the feedback causes the

wrong action to be taken.)• The following diagram illustrates the effect of

thrashing on performance

Page 8: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

8

Page 9: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

9

• Thrashing occurs when too many jobs are competing for too little memory

• Modern scheduling algorithm take into account not just CPU utilization, but paging behavior

• When the values for each of these parameters indicate that thrashing is imminent, no new jobs should be scheduled

Page 10: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

10

• In order to keep users happy, multi-user systems tend to try to allow as many users as possible, accepting a certain degradation of performance

• In single user, desktop type systems, there is no effective limit on how many different programs the user might try to run at the same time

• The consequence is that even with appropriate scheduling algorithms, it’s still possible for a modern system to enter a thrashing state

• If a system finally is overcome by thrashing, although not ideal, the ultimate solution is to swap out or terminate jobs

Page 11: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

11

• Local page replacement strategies are less prone to thrashing

• Since processes can’t steal frames from each other, they are insulated from each other’s behavior

• However, if the allocation each process gets is too small, each process individually can end up thrashing for the very reason that it can’t acquire any more frames

Page 12: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

12

• The ultimate question is, how many frames does a process need?

• More precisely, the question is, how many frames does a process need at any given time?

• Locality of reference underlies these questions• At any given time, a program is executing in some

cluster of pages• The goal is to allocate enough frames for whatever

locality or cluster the program is currently in

Page 13: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

13

• The working set model• This is a model of memory usage and

allocation that is based on locality of reference• Define a parameter Δ• Let this be an integer which tells how many

page references back into the past you want to keep track of

Page 14: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

14

• Define the working set window to be the Δ most recent page references

• Define the working set for this Δ to be the set of unique page references within the Δ most recent references

• Remember memory reference strings—the starting point for considering working sets is the set of memory references

Page 15: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

15

• Let this sequence of references be given• 1, 2, 3, 2, 1, 4, 6, 7, 3, 2, 1, 4, 5, 6, 7• Let Δ = 5• Let t1 be the set of the first five memory

references, {1, 2, 3, 2, 1}• Then the working set of t1 is WS(t1) = {1, 2, 3}• The working set size of t1 is WSS(t2) = 3

Page 16: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

16

• Let t2 be the set of the five memory references beginning with page 6, {6, 7, 3, 2, 1}

• Then the working set of t2 is WS(t2) = {1, 2, 3, 6, 7}

• The working set size of t2 is WSS(t2) = 5

Page 17: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

17

• The general idea is to use the number of unique page references that occurred in the last Δ references as the model for ongoing behavior

• It’s similar to other spots where past behavior is used to model future behavior

• The number of unique references in Δ is called the working set

• The goal is to allocate to a process a number of frames equal to the number of unique references

Page 18: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

18

• The definition of a working set is that number of frames which a process needs in order to execute without having an excessively high paging rate

• So the idea is that the number of unique references in Δ will be used to determine how many frames that should be

Page 19: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

19

• Note that this still doesn’t address all of the problems

• How big should Δ be?• If the number of unique references always equals Δ,

that suggests Δ that doesn’t actually include a sufficient working set and Δ should be larger

• If the number of unique references is significantly smaller than Δ, that suggests that it is not necessary to look at that many previous references and Δ could be smaller

Page 20: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

20

• The problem with this is imprecision• As noted earlier, a desired page fault rate might be in

the range of 1 out of 100,000 or less• For any given Δ, what are the chances that the

number of unique references in that Δ will so closely correspond to the number of unique references as the program continues to run?

• The problem is not just that Δ is an imprecise measure. The problem also is that the working set size of a process changes over time

Page 21: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

21

• Δ’s and working set sizes calculated from them may be more useful in the aggregate

• System-wide demand for memory frames is given by this formula, where the subscript i represents different processes:

• D = Σ(WWSi)

Page 22: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

22

• If D is greater than the total number of frames, memory is over-allocated

• This means that some degree of thrashing is occurring

• This suggests that the scheduling algorithm should be suspending/swapping out jobs

• This leads to the question of how to choose victims

Page 23: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

23

• If D is less than the total number frames, that means that from the point of view of memory, more jobs could be loaded and potentially scheduled

• Note that the optimal number of jobs to be scheduled also depends on the mix of CPU and I/O bound jobs

• It is conceivable that in order to optimize some measure of performance, like response time, the number and size of jobs scheduled would not always complete allocate physical memory

Page 24: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

24

• The book gives some a sample scenario for how to approximately determine a working set

• Let a reference bit be maintained (in the page table, for example) which records whether or not a page has been accessed

• In addition, let two bits be reserved for recording “historical” values of the reference bit

• Let the Δ of interest be 10,000

Page 25: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

25

• Let a counter be set to trigger an interrupt after every 5,000 page references

• When the interrupt occurs, shift the right history bit one position left, and shift the current reference bit into the vacated position

• Then clear the current reference bit

Page 26: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

26

• What this really accomplishes is to make the clock algorithm historical

• When a page fault occurs, the question boils down to, what page is a suitable victim?

• Put another way, the question is, what page is no longer in the working set?

• The point is that a page that is no longer in the working set is a suitable victim

Page 27: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

27

• A page where the current access bit is 0 and both history bits are 0 is considered to be out of the working set

• Saying that Δ was 10,000 was inexact• When a page fault occurs, you are somewhere

within the current count of 5,000 accesses, say at count x

• if all of the access bits are 0, then the page has not been accessed within the last x + 5,000 + 5,000 page references

Page 28: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

28

• If a finer grained approximation of Δ is desired, more history bits can be maintained and the interrupt can be triggered more often, with attendant costs

• On the other hand, although the working set model is useful and has been implemented to varying degrees, it is not necessarily the cleanest, easiest way of avoiding thrashing

Page 29: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

29

• Page fault frequency• This topic is mercifully brief and short on

details• The basic problem with thrashing is a high page

fault rate• Rather than trying to track the working set sizes

of processes, the problem can be approached more directly by tracking the page fault rate

Page 30: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

30

Page 31: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

31

• The question again arises, are you talking globally or locally

• In general, the discussion seems to be global• If the global fault rate goes too high, pick a

victim and decrease the multi-programming level

Page 32: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

32

• There are various ways to pick a victim• In particular, you might think there was one

offending process that had caused thrashing by entering a phase where it was demanding more memory

• That may be true, but if was a high priority process, it might be a suitable victim

• Any victim will do, because any victim will release memory to the remaining processes

Page 33: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

33

• If you were tracking page fault rate on individual processes, the scheme still works

• If a process starts generating too many page faults, it means that it hasn’t been granted a large enough working set

• If that process is of sufficient priority, then some other process will have to be swapped in order to give up its memory allocation

Page 34: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

34

9.7 Memory-Mapped Files

• In a high level language program, file operations include open(), read(), write(), etc.

• It is possible and usual for these operations to trigger an action that directly affects the file in secondary storage

• Using virtual memory techniques, it’s also possible to memory map the file

Page 35: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

35

• That means that just like a program file or dynamic data structure, memory pages can be allocated to keep all or part of the data file in memory

• Then at run time the file operations would simply be translated into memory accesses

Page 36: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

36

• Assuming the system supports this and you have memory to spare, a program with file operations can run orders of magnitude faster and accomplish the same result

• At the end of the run, all of the changed pages can be written out to update the copy of the file in secondary storage

Page 37: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

37

• Various details of memory mapped files• In theory, for a small file, you might read the

whole thing in in advance• However, it should also be possible to rely on

demand paging to bring it in as needed• Note that unlike program files, data files

would probably be paged out of the file system, not out of an image in swap space

Page 38: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

38

• There is also the question of how memory page sizes map to secondary storage blocks (sectors, etc.)

• This question also exists for normal paging—it is resolved at the lowest level of the O/S in concert with the MMU and disk access subsystem

• In both cases, program paging and memory mapped files, it is simply transparent

• An open file is accessible using file operations, whether it’s memory mapped or not

Page 39: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

39

• Some systems periodically check for modified pages and write them out

• Again, this can be done both for regular paging and for memory mapped files

• In either case, it’s transparent• In the case of a memory mapped file, it means

that the cost of writing it out in its entirety will not be incurred at closing time.

Page 40: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

40

• Implementation details• Some systems only memory map files if specific

system calls are made• Other systems, like Solaris, in effect always memory

map files• If an application makes memory mapping calls, this

is done in user space• If an application doesn’t make memory mapping

calls, the system still does memory mapping, but in system space

Page 41: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

41

• The concepts of shared memory and shared libraries were raised earlier

• Some systems also support shared access to memory mapped files

• Of processes only read the files, no complications result

• If processes write to the files, then concurrency control techniques have to be implemented

Page 42: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

42

• It is also possible to apply the copy on write principle

• This was mentioned earlier in the context of memory pages shared by a parent and child process

• With files, the idea is that if >1 process have access to the memory mapped file, if a process writes to it, then a new frame is allocated, the page is copied, and the write is made to the copy

Page 43: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

43

• The result is potentially a different copy of the file for each process

• Pages that had not been written to could still be shared

• Those that had been written could not• When each process closed the file, its

separate copy would have to be saved with a distinct name or identifier

Page 44: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

44

• In Windows NT, 2000, and XP, shared memory and memory mapped files are implemented using the same techniques and are called using the same interface

• In Unix and Linux systems, the implementation of shared memory and memory mapped files are separate

Page 45: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

45

• Memory mapped files in Java• The syntactical details themselves are not important

—an example is given, but no practical example of its use, and there is no assignment on this

• However, it’s useful to understand that a high level language API includes facilities for memory mapped files

• This is not just some hidden feature of an O/S. It’s a functionality that an application programmer can specify in code

Page 46: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

46

• Going through the keywords in Java is a handy way of reviewing the concrete aspects of memory mapping files

• Not surprisingly, this functionality exists in the I/O packages in Java. The book’s example imports these packages:

• java.io.*• java.nio.*• java.nio.channels.*

Page 47: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

47

• For any type of file object in Java you can call the method getChannel()

• This returns a reference to a FileChannel object• On a FileChannel object you can call the method

map()• This returns a reference to a MappedByteBuffer

object• This is a buffer of bytes belonging to the file

which is mapped into memory

Page 48: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

48

• The map() method takes three parameters:• 1. mode = READ_ONLY, READ_WRITE, or

PRIVATE• If the mode is private, this means copy-on-write

is in effect• 2. position = the byte offset into the file where

memory mapping is started• In a garden variety case, you’d just start at the

beginning, offset 0

Page 49: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

49

• 3. size = how many bytes of the file beyond the starting position are memory mapped

• In a garden variety case you’d map the whole file

• The FileChannel class has a method size() which will return the length of the file in bytes (-1?)

Page 50: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

50

• The Java API doesn’t have a call that will return the size of a memory page on the system

• If it’s desirable in user code to keep track of pages, the page size will have to be obtained separately

• In other words, the page size will have to hardcoded, and the code will then be system dependent

Page 51: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

51

• The Java API only supports keeping track of mapped memory through byte counts

• If you want to deal in pages, you would have to divide the bytecount by the page size

• The book’s example code appears to bear out the assumption that whatever part of a file may be memory mapped, the starting position is mapped to the 0th offset of the first page allocated for the memory mapping

Page 52: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

52

• Once a file is memory mapped in Java, there is no specific call that will unmap it

• Like in many other areas, you are simply reliant on garbage collection

• If you release the reference to the MappedByteBuffer by setting it to null, for example, eventually the buffer would go away and future file operations would go directly to the file on disk, not to memory

Page 53: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

53

• Memory mapped I/O• All the way back in chapter 1 this concept was

brought up• The basic idea is that for some peripherals it’s

desirable to read or write large blocks of values directly from or to memory

• This can be convenient for peripherals that are fast and deal in large quantities of data

• Memory mapped I/O means not having to generate an interrupt for every byte transferred

Page 54: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

54

• A video controller in the IBM PC architecture is an example

• Memory in the system is reserved and mapped to the controller

• A write to a byte in that memory can be transferred directly to the controller, which can display it on the screen

• The book also discusses serial and parallel port I/O, but I’m not particularly interested in those examples

Page 55: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

55

9.8 Allocating Kernel Memory

• This topic has come up briefly along the way a few times

• It can be previewed in a very simple way: The O/S may allocate memory to itself differently than it allocates memory to user processes

• In particular, kernel memory may be paged, but a system may also allocate kernel memory without paging

Page 56: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

56

• There are two main aspects to why kernel memory might be treated differently:

• 1. As the memory manager, the system should not waste memory

• 2. The system may have need for memory which is not paged

• These two points are expanded below

Page 57: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

57

• 1. You can’t expect user processes to manage memory in any case, but the system is the memory manager, so it can do anything it wants to

• Specifically, the system provides transparent, paged memory to user processes

• The system also makes regular, significant demands on memory

Page 58: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

58

• Since the system is the memory manager, it should be possible to optimize its use of it, eliminating waste and speeding access

• For example, the O/S regularly allocates space for buffers and other objects which may smaller or larger than pages

• It becomes desirable to allocate memory in the size of the data structure, not in page units

Page 59: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

59

• 2. The point of paging was that a process was no longer constrained to contiguous memory—but the O/S may need contiguous memory

• For example, if the system supports memory mapped I/O, it only makes sense for the mapped block of memory to be contiguous

• This means either overriding paging or allocating memory in a different way

Page 60: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

60

• There are two general strategies for allocating kernel memory:

• 1. The buddy system• 2. Slab allocation

Page 61: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

61

• The buddy system• Given some contiguous block of memory and

some memory allocation need, repeatedly divide the slab in half until you arrive at the smallest fraction of it, 1 / 2n, that is large enough to satisfy the request

• Each time the block or a sub-block are divided, the resulting pair of sub-blocks are known as buddies

Page 62: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

62

• Memory requests don’t come in neat powers of two

• That means that the smallest buddy large enough to hold a request will consist of up to 50% waste due to internal fragmentation

• The average waste will be 25% of the block size• The book doesn’t mention external

fragmentation, but it seems like the scheme would also be prone to that problem

Page 63: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

63

• The book claims that the buddy system is advantageous because when memory is released, it is a simple matter to combine buddies, regaining a block of contiguous memory of twice the size

• It is not clear why this scheme is better than pure contiguous memory allocation

• Maybe it requires less bookkeeping cost and so the scheme has lower overhead or can do the allocations more quickly

Page 64: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

64

• Slab allocation• Slab allocation relies on this underlying idea:• O/S memory requests fall into a finite set of

different data types that are of fixed, known sizes

• Slab allocation can be implemented on top of paged memory, where paging is not done

• Pages are allocated to the slabs in contiguous blocks

Page 65: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

65

• Working from the bottom up, there are four things to keep in mind for slab allocation:

• 1. Kernel objects: these are instances of data structures which the O/S uses. These are the things which are of constant size

• 2. Pages: The physical pages/frames are no different from before. They are of some given size and location in memory

Page 66: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

66

• 3. Slabs: A slab consists of a fixed number of contiguous memory pages.

• A slab has to be large enough to hold the largest kind of kernel object

• A slab, even consisting of only one page, may be large enough to hold many smaller kinds of kernel objects

Page 67: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

67

• 4. Caches: This term should not be confused with the term cache as it’s customarily used

• There is one cache for each kind of kernel object

• A cache can consist of one or more slabs• A cache is “virtual”. In other words, the slabs

have to consist of contiguous pages, but a cache doesn’t have to consist of contiguous slabs

Page 68: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

68

• Slab allocation works in this way:• When the O/S starts, a cache is created for each

kind of kernel data structure• Each cache begins existence with at least one

slab allocated to it• Each cache/slab is filled with however many

copies of the data structure it can hold• These copies are blank or empty• They are ready for use when the time comes

Page 69: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

69

• When the O/S needs a data structure, it assigns itself one out of the corresponding cache and initializes it for use

• If a cache runs out of empty data structures, a new slab is allocated to it and the slab is filled with empties. (This is literally the step that is slab allocation.)

Page 70: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

70

• As the O/S finishes with data structures, it releases them and wipes them clean so that other system processes can use them

• If all of the structures in a given slab have been emptied and other slabs in the cache still have empties available, it is possible that the entire slab would be deallocated from the cache

Page 71: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

71

• Slab allocation has two advantages:• 1. No memory is wasted due to fragmentation• 2. Memory requests are satisfied quickly• Keep in mind that that doesn’t mean there are no

costs• The system incurs the cost of pre-creating the data

structures• It also incurs the cost of using memory to hold empty

structures. This is cost that directly corresponds to the cost of fragmentation

Page 72: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

72

9.9 Other Considerations

• Recall that these are the two major considerations when implementing virtual memory:

• 1. The page replacement algorithm (how to choose a victim page)

• 2. The memory allocation scheme (how many frames to give to each process)

Page 73: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

73

• The book lists six other considerations, some of what have already been considered, at least in part:

• 1. Prepaging• 2. Page size• 3. TLB reach• 4. Inverted page tables• 5. Program structure• 6. I/O interlock

Page 74: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

74

• Prepaging• When a process starts, it will generate page

faults under demand paging until its working set is in memory

• This is an inherent cost of not allocating memory and loading the whole process up front

Page 75: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

75

• The problem is repeated if a process is swapped out, either because memory is over-allocated or because the scheduler determines that the multi-programming level is too high

• When a process is swapped out, it’s possible to record its working set at that time

Page 76: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

76

• This makes it possible to prepage the working set back into memory before the process is started again

• Prepaging come out of secondary storage, just like regular paging, so its not like you’re saving I/O cycles

• What you’re doing is trying to anticipate so that there is no up front paging delay when the process is restarted

Page 77: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

77

• The book observes that prepaging is a worthwhile strategy when the number of pages brought in that are used outweighs the number of pages brought in that are not used

• The obvious point is that prepaging is not demand paging—pages are brought in before demand occurs

Page 78: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

78

• Assuming the working set is stable, prepaging may provide some benefit, but on the other hand, up front paging costs may not be a big concern, in which case relying on demand paging still works fine

• Solaris is an example of an operating system that includes some elements of prepaging in its implementation

Page 79: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

79

• Page size• Page size is not a software, O/S programmer

decision• Page size is defined in the hardware

architecture• The MMU and support mechanisms like the

TLB have to be physically sized (number of bits, etc.) based on page size

Page 80: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

80

• Hardware decisions like this aren’t made in a vacuum

• Chip designers and O/S programmers typically communicate with each other

• Page size and maximum address space affect the page table size and how addresses are handled in O/S code that interfaces with the MMU

Page 81: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

81

• Picking a page size is a trade-off• A larger page size means a smaller table,

which is good• A smaller page size means less internal

fragmentation, which is good• A smaller page size also has what is known as

a higher resolution, which is good

Page 82: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

82

• Resolution refers to this idea:• Say you are only interested in a single word on

a page• If the page size is large, you have to read in a

lot of extraneous stuff in order to access the word

• If the page size is small, you have to read in less extraneous stuff

Page 83: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

83

• Resolution potentially applies across the whole working set

• If a process accesses isolated addresses on pages, the total amount of memory consumed by the working set will be smaller if the page size is smaller

Page 84: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

84

• On the other hand, the cost of paging from secondary storage is largely in the latency and seek times, not in the data transfer

• This means the cost of reading a page is about the same whether the page is large or small

• If the page size is small, the I/O cost of reading in a given amount of memory will be higher than if the page size is large

Page 85: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

85

• In summary, picking a page size is a classic balancing act

• All that can be said for sure is that as the size of memory and secondary storage has increased, the size of pages has tended to increase

Page 86: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

86

• TLB reach• TLB reach is simply the measure of how much

memory in total is accessible through TLB hits• In other words, it’s the number of TLB entries

times the size of a page• Under demand paging, you hope to have to

bring in a page from secondary storage extremely rarely (once for every page needed, ideally)

Page 87: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

87

• If you meet that goal, you would also like to satisfy memory accesses as often as possible with a TLB hit, avoiding the extra cost of reading the page table in memory

• In order for this to happen, you’d like the TLB reach to encompass the working set of a process

Page 88: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

88

• The larger the TLB, the greater the reach, but increasing the size of the TLB is expensive

• The larger the page size, the greater the reach, another argument in favor of large page sizes

• Some architectures and operating systems support multiple page sizes

Page 89: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

89

• Using small pages with small processes minimizes internal fragmentation

• Using large pages with large processes increases TLB reach

• A concrete example of hardware architecture: The UltraSPARC chip supports pages (frames) of 8 KB, 64 KB, 512 KB, and 4 MB

• The TLB size is 64 entries

Page 90: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

90

• The Solaris operating system uses two of these page sizes, 8 KB and 4 MB

• For a small processes, the average waste due to internal fragmentation is 4 KB rather than 2 MB

• For large processes the TLB reach is 4 MB * 64 table entries, or 256 MB

Page 91: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

91

• In a system with one page size, most of the TLB control can be done in the MMU hardware, which is speedy

• With multiple page sizes the O/S becomes involved in managing which processes get which page sizes and making sure this information is recorded in the frame table, page table, and TLB entries

• As usual, it’s a trade-off

Page 92: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

92

• Inverted page tables• Note that inverted page tables were brought up

before the topic virtual memory• That means that the whole process is loaded into

memory• The frame allocations for every process could be

held in a single, unified, inverted page table (frame table)

• This meant that you didn’t have to have a page table for each process

Page 93: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

93

• The page table provided look-up• For a process id and a logical page address, you

could find the physical frame allocate to it• If the whole process is loaded into memory (even if

non-contiguous) there’s no problem• By definition, if an entry was in the page table, it

was valid• There was no system for identifying invalid (let lone

valid but not paged in) addresses other than their absence from the table

Page 94: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

94

• As a consequence, the inverted page table can’t give a comprehensive view of the frame allocations for a whole process under virtual memory management

• This is a problem under virtual memory management• If you look up a <pid, p> pair and don’t find an entry in

the inverted page table there are two outcomes:• 1. You don’t know if the page is valid or invalid (whether

it’s not paged in or whether it’s invalid)• 2. You don’t know whether this is a hard fault or whether

the page can be paged in from secondary storage

Page 95: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

95

• The final result of this is that if you’re doing virtual memory with an inverted page table, you still need regular page tables for each process

• These would have to be paged in an out on a per process basis

• This may still be practical because you would only need to read the page tables on a page fault, in other words, rarely

Page 96: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

96

• On the other hand, it would double the I/O cost of a single user page fault, because a page table fault would occur

• This means that for a given acceptable level of overhead cost for virtual memory, the page fault rate would have to be cut in half again

Page 97: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

97

• Unfortunately, you’re not just incurring I/O cost twice; you’re dealing with two page fault traps in a row

• First the user process faults. While handling that the O/S faults on the page table

• Dealing with this “double fault” situation doesn’t make the operating system code any easier to write

Page 98: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

98

• Program structure• Paging is transparent to users, but the reality

is that how programs are structured can affect how they perform in a paged environment

• The book gives an illustrative example:• A 2-D array may be sized so that each row fits

on a single page

Page 99: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

99

• If you write a nested loop to access array elements in row/column order, you’ll step on each page m x n times

• If you write the loop in column/row order, you’ll step on each page n times

• In other words, program structure can affect locality of reference

Page 100: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

100

• Some optimizing compilers may take this into account

• Some programmers may also conceivably take this into account

• At a macro level, linking and loading can also affect locality of reference and performance

• In general, it is best if modules are loaded so that they don’t cross page boundaries

Page 101: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

101

• Also, if possible, modules that call each other should be put on the same page

• Determining the best arrangement of a set of modules on a set of pages is equivalent to a classic operations research problem

• In theory, an operating system could implement some serious algorithms in order to load code as effectively as possible

Page 102: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

102

• I/O Interlock• This phrase refers to locking or pinning pages

in memory• This means the pages can’t be selected as

victims and paged out

Page 103: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

103

• The most obvious case where this would be desirable is the operating system code itself

• In most systems, some or all of the O/S is pinned in memory

• This is slightly different from reserved memory• Between different boot-ups the location of

system code may vary, but once loaded, it stays put

Page 104: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

104

• Imagine what would happen if the memory allocation module of the O/S were paged out

• At the very least, that part of the module which could page the rest of itself in would have to remain memory resident

• Otherwise the system would freeze

Page 105: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

105

• The book also mentions that some systems support reading directly from secondary storage to user memory space rather than a system buffer

• This is simply memory mapped I/O into user space

• If user memory is to be used in this way, it has to be locked in memory

Page 106: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

106

• The book offers a final scenario:• Process A faults and its page is read in• Process B is then scheduled and faults• Is process A’s page fair game as a victim for B

or not?• If it is, then one I/O cycle has been wasted• If not, then process A’s page has to be locked

in memory

Page 107: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

107

• In general, locking goes against the grain of flexible adaptation in memory allocation to working set size that demand paging is supposed to support

• Locking may be good in a limited number of cases, but in general the O/S should be chary in its application or in accepting system requests for its application

Page 108: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

108

9.10 Operating-System Examples

• An overview of Windows XP• XP implements demand paging• Processes are assigned a working set

minimum, typically 50 pages• Processes are also assigned a working set

maximum, typically 345 pages

Page 109: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

109

• If a process is below the maximum and page faults, if memory is available it’s allocated from the free space and the system reads using a clustering technique

• Clustering means that the desired page, plus x subsequent pages, are read in at the same time

• If the process is at the maximum, a victim chosen from its current allocation

Page 110: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

110

• The system maintains a free frame pool and a threshold parameter is set

• When the free frame pool falls below the threshold, the system steals pages from processes that are at maximum allocation

• This is known as automatic working set trimming

Page 111: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

111

• On single processor 80x86 systems, victims are chosen using a version of the clock algorithm

• On other systems, a version of FIFO page replacement is used

Page 112: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

112

• Solaris• The book gives many details of the implementation

of virtual memory in Solaris• The main conclusion that can be drawn from this is

the same as for the detailed presentation of scheduling algorithms and the presentation of segmented address resolution

• Real systems are complex and tend to include lots of features in an attempt to balance conflicting performance goals

Page 113: Chapter 9, Virtual Memory Overheads, Part 2 Sections 9.6-9.10 1

113

The End