Ch4 memory management

Chapter 4

Memory Management

Memory

• Memory is an important resource that must be carefully

managed.

• What every programmer would like is an infinitely large,

infinitely fast memory that is also non-volatile (as in,

memory that does not lose its contents when power is cut).

• The part of the OS that manages the memory hierarchy is

called the memory manager.

– Its job is to keep track of which parts of memory are in use and which

parts are not in use.

– To allocate memory to processes when they need it.

– To deallocate it when they’re done.

– To manage swapping between main memory and disk when main memory

is too small to hold all the processes.

Basic Memory Management

• Memory management systems can be divided into two

basic classes:

– Those that move processes back and forth between main memory and disk

during execution (swapping and paging) and

– Those that don’t.

• The latter are simpler, so we will study them first.

• Later in the chapter we will examine swapping and paging.

• For now, keep in mind: swapping and paging are largely

artifacts caused by the lack of sufficient main memory to

hold all programs and data at once.

• Btw, we finally ―carbon-dated‖ the book: It’s ancient!!!

– “Now Microsoft recommends having at least 128MB for a single-user

Windows XP system” …no wonder they keep banging on about floppies

and tape drives!

Monoprogramming without Swapping

or Paging • The simplest possible memory management scheme is to

run just one program at a time, sharing the memory

between that program and the OS.

• Three variations on this theme are shown below:

Figure 4-1. Three simple ways of organizing memory with an operating system and one user process. Other possibilities

also exist.


or Paging • The OS may be at the bottom of memory in RAM (a). Or it

may be in ROM at the top of memory (b) or the device

drivers may be at the top of memory in a ROM and the rest

of the system in RAM down below (c).


or Paging • The first model was formerly used on mainframes and minicomputers

but is rarely used any more.

• The second model is used on some palmtop computers and embedded

systems.

• The third model was used by early personal computers (e.g., running

MS-DOS), where the portion of the system in the ROM is called the

BIOS.

• When the system is organised in this way, only one process at a time

can be running.

• As soon as the user types a command, the OS copies the requested

program from disk to memory and executes it.

• When the process finishes, the OS displays a prompt character and

waits for a new command.

• When it receives the command, it loads a new program into memory,

overwriting the first one.

Multiprogramming with Fixed Partitions

• Except on very simple embedded systems,

monoprogramming is hardly used any more.

• Most modern systems allow multiple processes to run at

the same time.

• Having multiple processes running at once means that

when one process is blocked waiting for I/O to finish,

another one can use the CPU.

– Multiprogramming increases the CPU utilisation.

• The easiest way to achieve multiprogramming is simply to

divide memory up into n (possibly unequal) partitions.

• This partitioning can, for example, be done manually when

the system is started up.


• When a job arrives, it can be put into the imput queue for the smallest partition large enough to hold it.

• Since the partitions are fixed in this scheme, any space in a partition not used by a job is wasted while that job runs.

• In the next figure (a) we see how this system of fixed partitions and separate input queues look.

– The disadvantage of sorting the incoming jobs into separate queues becomes apparent when the queue for a large partition is empty but the queue for a small partition is full, as is the case for partitions 1 & 3 in (a).

• An alternative organisation is to maintain a single queue as in (b).

– Whenever a partition becomes free, the job closest to the front of the queue that fits in it could be loaded into the empty partition and run.

– Since it’s undesirable to waste a large partition on a small job, a different strategy is to search the whole input queue whenever a partition becomes free and pick the largest job that fits.

Multiprogramming with Fixed Partitions (1)

Figure 4-2. (a) Fixed

memory partitions with

separate input queues

for each partition.

Multiprogramming with Fixed Partitions (2)

Figure 4-2. (b) Fixed

memory partitions with

a single input queue.


• Note that the latter algorithm discriminates against small jobs as being unworthy of having a whole partition, whereas usually it is desirable to give the smallest jobs (often interactive jobs) the best service, not the worst.

– One way out is to have at least one small partition around.

• Such a partition will allow small jobs to run without having to allocate a large partition for them.

– Another approach is to have a rule stating that a job that is eligible to run may not be skipped over more than k times.

• Each time it’s skipped over, it gets one point. When it has aquired k points, it may not be skipped again.

• This system, with fixed partitions set up by the operator in the morning and not changed thereafter, was used by OS/360 on large IBM mainframes for many years – it was called MFT (Multiprogramming with a Fixed number of Tasks or OS/MFT).

Relocation and Protection

• Multiprogramming introduces two essential problems that

must be solved:

– Relocation and protection.

• From the previous two figures it is clear that different jobs

will be run at different addresses.

– When a program is linked (i.e., the main program, user-written procedures,

and library procedures are combined into a single address space), the

linker must know at what address the program will begin in memory.

– For example, suppose that the first instruction is a call to a procedure at

absolute address 100 within the binary fire produced by the linker.

– If this program is loaded in partition 1 (at address 100K), that instruction

will jump to absolute address 100, which is inside the OS.

– What is needed is a call to 100K + 100.

– If the program is loaded into partition 2, it must be carried out as a call to

200K + 100, and so on. this is the relocation problem.

Relocation and Protection • A solution for this is to equip the machine with two special

hardware registers, called the base and limit registers. – When a process is scheduled, the base register is loaded with the address

of the start of its partition, and the limit register is loaded with the length of the partition.

– Every memory address generated automatically has the base register contents added to it before being sent to memory.

– Thus if the base register contains the value 100K, a CALL 100 instruction is effectively turned into a CALL 100K + 100 instruction, without the instruction itself being modified.

– Addresses are also checked against the limit register to make sure that they do not attempt to address memory outside the current partition.

– The hardware protects the base and limit registers to prevent user programs from modifying them.

– A disadvantage of this scheme is the need to perform an addition and a comparison on every memory reference.

Relocation and Protection – Comparisons can be done fast, but additions are slow due to carry

propagation time unless special addition circuits are used.

• The CDC 6600 – the world’s first supercomputer – used

this scheme.

• The Intel 8088 CPU used for the original IBM PC used a

slightly weaker version of this scheme – base registers, but

no limit registers.

• Few computers use it now.

Swapping

• With a batch system, organising memory into fixed

partitions is simple and effective.

• Each job is loaded into a partition when it gets to the heard

of the queue.

• It stays in memory until it has finished.

• As long as enough jobs can be kept in memory to keep the

CPU busy all the time, there is no reason to use anything

more complicated.

Swapping

• With timesharing systems or graphics-orientated personal

computers, the situation is different.

• Sometimes there is not enough main memory to hold all

the currently active processes, so excess processes must

be kept on disk and brought in to run dynamically.

• Two general approaches to memory management can be

used, depending on the available hardware:

– Swapping (the simplest strategy that consists of bringing in each process in

its entirety, running it for a while, then putting it back on the disk) and

– Virtual memory (which allows programs to run even when they are only

partially in main memory).

Swapping

• The operation of a swapping system is shown below:

Figure 4-3. Memory allocation changes as processes come into

memory and leave it. The shaded regions are unused memory.

Swapping • Initially, only process A is in memory.

• Then process B and C are created or swapped in from disk.

• In (d) A is swapped out to disk.

• Then D comes in and B goes out.

• Finally A comes in again.

• Since A is now at a different location, addresses contained in it must be

relocated, either by software when it is swapped in or (more likely) by

hardware during program execution.

Swapping • The main difference between the fixed partitions of the second figure

(Fig. 4-2) and the variable partitions shown here is that the number,

location, and size of the partitions vary dynamically in the latter as

processes come and go, whereas they are fixed in the former.

• The flexibility of not being tied to a fixed number of partitions that may

be too large or too small improves memory utilization, but it also

complicates allocating and deallocating memory, as well as keeping

track of it.

Swapping

• When swapping creates multiple holes in memory, it is

possible to cimbine them all into one big one by moving all

the processes downward as far as possible.

• This technique is known as memory compaction.

– It is usually not done because it requires a lot of CPU time.

• Also, when swapping processes to disk, only the memory

actually in use should be swapped.

• It is wasteful to swap the extra memory as well.

• In Fig 4.4 (a) we see a memory configuration in which

space for growth has been allocated to two processes.

Swapping

• If processes can have two growing segments, for example,

– the data segment being used as a heap for variables that are dynamically

allocated and released

– and a stack segment for the normal local variables and return addresses,

an alternative arrangement suggest itself, namely that of

(b).

Figure 4-4. (a) Allocating space for a

growing data segment. Figure 4-4. (b) Allocating space for a growing

stack and a growing data segment.

Swapping

• In (b) we see that each process illustrated has a stack at

the top of its allocated memory that is growing downward.

– And a data segment just beyond the program text that is growing upward.

• The memory between them can be used for either

segment.

• If it runs out, either the process will have to be moved to a

hole with sufficient space, swapped out of memory until a

large enough hole can be created, or killed.

Swapping

• Memory management with Bitmaps

– When memory is assigned dynamically, the OS must manage it.

– In general terms, there are two ways to keep track of memory usage:

bitmaps and free lists.

– In this section and the next one we will look at these two methods in turn.

– With a bitmap, memory is divided up into allocation units, perhaps as

small as a few words and perhaps as large as several kilobytes.

– Corresponding to each allocation unit is a bit in the bitmap, which is 0 if

the unit is free and 1 if it is occupied (or vice versa).

– The next figure shows part of memory and the corresponding bitmap.

Memory Management with Bitmaps

Figure 4-5. (a) A part of memory with five processes and three

holes. The tick marks show the memory allocation units. The

shaded regions (0 in the bitmap) are free. (b) The corresponding bitmap. (c) The same information as a list.

– The size of the allocation unit is an important design issue.

– The smaller the allocation unit, the larger the bitmap.

– However, even with an allocation unit as small as 4 bytes, 32 bits of

memory will require only 1 bit of the map.

– A memory of 32n bits will use n map bits, so the bitmap will take up only

1/33 of memory.

– If the alloc unit is chosen large, the bitmap will be smaller.

Memory Management with Bitmaps




– But, appreciable memory may be wasted in the last unit of the process if

the process size is not an exact multiple of the allocation unit.

– A bitmap provides a simple way to keep track of memory words in a fixed

amount of memory because the size of the bitmap depends only on the size

of memory and the size of the allocation unit.

– The main problem with it is that when it has been decided to bring a k unit

process into memory, the mem manager must search the bitmap to find a

run of k consecutive 0 bits in the map

– And searching a bitmap for a run of a given length is a slow operation

Memory Management with Linked Lists




– Another way of keeping track of memory is to maintain a linked list of

allocated and free memory segments, where a segment is either a process

or a hole between two processes.

– The memory of (a) is represented in (c) as a linked list of segments.

– Each entry in the list specifies a hole (H) or process (P), the address at

which it starts, the length, and a pointer to the next entry.

– In this example, the segment list is kept sorted by address.

– Sorting this way has the advantage that when a process terminates or is

swapped out, updating the list is straightforward.


– A terminating process normally has two neighbours (except when it is at

the very top or very bottom of memory).

– These may be either processes or holes, leading to the four combinations

shown below.

– In (a) updating the list requires replacing a P by an H.

– In (b) and also in (c), two entries are coalesced into one, and the list

becomes one entry shorted.

– In (d), three entries are merged and two items are removed from the list.

Figure 4-6. Four neighbor combinations for the terminating process, X.


– Since the process table slot for the terminating process will normally point

to the list entry for the process itself, it may be more convenient to have

the list as a double-linked list, rather than the single-linked list of Fig 4.5

(c).

– This structure makes it easier to find the previous entry and to see if a

merge is possible.

Figure 4-6. Four neighbor combinations for the terminating process, X.


• When the processes and holes are kept on a list sorted by

address, several algorithms can be used to allocate

memory for a newly created process (or an existing

process being swapped in from disk).

• We assume that the memory manager knows how much

memory to allocate.

• The simples algorithm is first fit.

– The process manager scans along the list of segments until it finds a hole

that is big enough.

– The hole is then broken up into two pieces, one for the process and one for

the unused memory, except in the statistically unlikely case of an exact fit.

– First fit is a fast algorithm because it searches as little as possible.


• Another well-known algorithm is next fit. – It works the same way as first fit, except that it keeps track of where it is

whenever it finds a suitable hole.

– The next time it is called to find a hole, it starts searching the list from the place where it left off last time, instead of always beginning, as first fit does.

– Simulations by Bays (1977) show that next fit gives slightly worse performance than first fit.

• Then, best fit. – Best fit searches the entire list and takes the smallest hole that is adequate.

– Rather than breaking up a big hole that might be needed later, best fit tries to find a hole that is close to the actual size needed.

– Best fit is slower that first fit because it must search the entire list every time it is called.

– Somewhat surprisingly, it also results in more wasted memory than first fit or next fit because it tends to fill up memory with tiny, useless holes (first fit creates larger holes on average).


• Then there’s worst fit.

– To get around the problem of breaking up nearly exact matches into a process and

a tiny hole, one could think about worst fit, that is, always take the largest available

hole, so that the hole broken off will be big enough to be useful.

– Simulation has shown that worst fit is not a very good idea either.

• Then, quick fit.

– Quick fit maintains separate lists for some of the more common sizes requested.

– For example, it might have a table with n entries, in which the first entry is a

pointer to the head of a list of 4-KB holes, the second entry is a pointer to a list of

8-KB holes, the third entry a pointer to 12-KB holes, and so on.

– Holes of say, 21-KB, could either be put on the 20-KB list or on a special list of

odd-sized holes.

– With quick fit, finding a hole of the required size is extremely fast, but it has the

same disadvantage as all the other scheme that sort by hole size, namely: when a

process terminates or is swapped out, finding its neighbours to see if a merge is

possible, is expensive.

– If merging is not done, mem will quickly fragment into a large number of small

holes into which no processes fit.

Memory Allocation Algorithms

• First fit

Use first hole big enough

• Next fit

Use next hole big enough

• Best fit

Search list for smallest hole big enough

• Worst fit

Search list for largest hole available

• Quick fit

Separate lists of commonly requested sizes

Virtual Memory

• Many years ago people were first confronted with

programs that were too big to fit in the available memory.

• The solution usually adopted was to split the program into

pieces, called overlays.

• Overlay 0 would start running first.

• When it was done, it would call another overlay.

• Some overlay systems were highly complex, allowing

multiple overlays in memory at once.

• The overlays were kept on the disk and swapped in and

out of memory by the OS, dynamically, as needed.

• Although the actual work of swapping overlays was done

by the system, the decision of how to split the program into

pieces had to be done by the programmer.

Virtual Memory

• Splitting up large programs into small, modular pieces was

time consuming and boring.

• It did not take long before someone thought of a way to

turn the whole job over to the computer:

– This method: virtual memory.

• The basic idea behind virtual memory:

– The combined size of the program, data, and stack may exceed the amount

of physical memory available for it.

– The OS keeps those parts of the program currently in use in main memory,

and the rest on the disk.

– For example, a 512MB program can run on a 256MB machine by

carefully choosing which 256MB to keep in memory at each instant, with

pieces of the program being swapped between disk and memory as

needed.

Paging

• Most virtual memory systems use a technique called

paging, which we will now describe.

• One any computer, there exists a set of memory addresses

that programs can produce.

• When a program uses an instruction like:

– MOV REG, 1000

• It does this to copy the contents of memory address 1000

to REG.

• Addresses can be generated using indexing, base

registers, segment registers, etc.

Paging

Figure 4-7. The position and function of the MMU. Here the MMU

is shown as being a part of the CPU chip because it commonly is

nowadays. However, logically it could be a separate chip and

was in years gone by.

– These program-generated addresses are called virtual addresses and form

the virtual address space.

– On computers without virtual memory, the virtual address is put directly

onto the memory bus and causes the physical memory word with the same

address to be read or written.

– When virtual memory is used, the virtual addresses do not directly go to

the memory bus.

– Instead, they go to an MMU (Mem. Management Unit) that maps the

virtual addresses onto the physical memory addresses:

Paging (2)

Figure 4-8. The relation between

virtual addresses and physical

memory addresses is given by

the page table.

• An e.g. of how this mapping works is shown on the

RHS.

• Here we have a computer that can generate 16-bit

addresses, from 0 up to 64-K.

• These are the virtual addresses.

• The computer, however, only has 32-KB of physical

memory, so although 64-KB programs can be

written, they cannon be loaded into memory in their

entirety and run.

• A complete copy of a program’s memory image, up

to 64-KB, must be present on the disk, however, so

that pieces can be brought in as needed.

Paging (2)




the page table.

• The virtual address space is divided up into units

called pages.

• The corresponding units in the physical memory are

called ―page frames‖.

• The pages and page frames are always the same

size.

• In this example they are 4-KB, but page sizes from

512 bytes to 1 MB have been used in real systems.

• With 64KB of virtual address space and 32KB of

physical memory, we get 16 virtual pages and 8

page frames.

• Transfers between Ram and disk are always in units

of a page.

Paging (2)




the page table.

• When the program tries to access address 0, for

example, using the instruction

MOV REG, 0

virtual address 0 is sent to the MMU.

• The MMU sees that this virtual address falls in page

0 (0 – 4095), which according to its mapping is page

frame 2 (8192 to 12287)

• It thus transforms the address to 8192 and outputs

address 8192 onto the bus.

• The memory knows nothing at all about the MMU

and just sees a request for reading or writing

address 8192, which it honours.

• Thus, the MMU has effectively mapped all virtual

addresses between 0 and 4095 onto physical

addresses 8192 to 121287.

Paging (2)

Figure 4-9. The internal

operation of the MMU

with 16 4-KB pages.

• By itself, the ability to map the 16 virtual pages onto

any of the 8 page frames by setting the MMU’s map

appropriately does not solve the problem that the

virtual address space is larger than the physical

memory.

• Since we have only 8 physical page frames, only 8

of the virtual pages in the figure are mapped onto

physical memory.

• In the RHS figure, we see an example of a virtual

address 8196 (0010000000000100) being mapped

using the MMU map op the previous figure.

– The incoming 16-bit virtual address is split into a 4-bit page

number and a 12-bit offset.

– With 4 bits for the page number, we can have 16 pages

– And with 12 bits for the offset, we can address all 4096

bytes within a page.

• The page number is used as an index into the page

table, yielding the number of the page frame

corresponding to that virtual page.

Page Tables

• Purpose : map virtual pages onto page

frames

• Major issues to be faced

1. The page table can be extremely large

2. The mapping must be fast.

Multilevel Page Tables

• To get around the problem of having to store huge page

tables in memory all the time, many computers use a

multilevel page table.

• A simple example is shown:

Figure 4-10. (a) A 32-bit

address with two page table

fields. (b) Two-level page

tables.

Multilevel Page Tables

Figure 4-10. (a) A 32-bit

address with two page table

fields. (b) Two-level page

tables.

• In (a) we have a 32-bit virtual address that is

partitioned into a 10-bit PT1 field, a 10-bit PT2 field,

and a 12-bit Offset field.

• Since offsets are 12 bits, pages are 4KB, and there

are a total of 2^20 of them.

• The secret to the multilevel page table method is to

avoid keeping all the page tables in memory all the

time.

• In particular, those that are not needed should not

be kept around.

• In (b) we see how the two-level page table works.

– On the left we have the top-level page table, with 1024

entries, corresponding to the 10-bit PT1 field.

– When a virtual address is presented to the MMU, it first

extracts the PT1 field and uses this value as an index into the

top-level page table.

– Each of these 1024 entries represents 4M because the entire

4-gigabyte virtual address space has been chopped into

chunks of 1024 bytes.

• The entry located by indexing into the top-level page

table yields the address of the page frame # of a

second-level page table.

Structure of a Page Table Entry • The exact layout of a page table entry is highly machine dependent, but the

kind of information present is roughly the same from machine to machine.

• The figure below shows a sample page entry.

• The size varies from computer to computer, but 32 bits is a common size.

• The most important field is the page frame number.

– The goal of the page mapping is to locate this value.

• Next to it we have the present/absent bit.

– If this bit is 1, the entry is valid and can be used.

– If it is 0, the virtual page to which the entry belongs is not currently in memory.

– Accessing a page table entry with this bit set to 0 causes a page fault.

• The protection bit tells what kinds of access are permitted.

Figure 4-11. A typical page table entry.

Structure of a Page Table Entry • In the simplest form, the protection bit is 0 for read/write and 1 for read

only.

• A more sophisticated arrangement is having 3 independent bits, one

bit each for individually enabling reading, writing and executing the

page.

• The modified and referenced bits keep track of page usage.

– When a page is written to, the hardware automatically sets the modified bit.

– This bit is used when the OS decided to reclaim a page frame.

– If the page in it has been modified (i.e. is “dirty”), it must be written back to the

disk

– If it has not been modified (i.e. is “clean”), it can just be abandoned, since the disk

copy is still valid.

– The bit is sometimes called the “dirty bit”, since it reflects the page’s state.

TLBs—Translation Lookaside Buffers

• In most page schemes, the page tables are kept in memory, due to

their large size.

• Potentially, this design has an enormous impact on performance.

• The solution is to equip computers with a small hardware device for

rapidly mapping virtual addresses to physical addresses without going

through the page table.

– This device, called the TLB, or associated memory, is shown below:

Figure 4-12. A TLB to speed up paging.

TLBs—Translation Lookaside Buffers

• It’s usually inside the MMU and consists of a small number of entries,

eight in this case, bur rarely more than 64.

• Each entry contains information about one page, including the virtual

page number, a bit that is set when the page is modified, the

protection code (read/write/execute permisions), and the physical

page frame in which the page is located.

• These fields have a one-to-one correspondence with the fields in the

page table.

Figure 4-12. A TLB to speed up paging.

Inverted Page Tables • Traditional page tables, like the one described, require one entry per

virtual page, since they are indexed by virtual page number.

• If the address space consists of 2^32 bytes, with 4096 bytes per page,

then over 1 million page table entries are needed.

• As a bare minimum, the page table will have a size of 4 MB (doable).

• On 64-bit computes, this situation changes drastically

– If the address space is 2^64 bytes, with 4KB pages, we need a page table with

2^52 entries

Figure 4-13. Comparison of a traditional page table with an inverted page table.

Inverted Page Tables

• If each page entry is 8 bytes, the table is over 30 million

gigabytes.

• Consequently, a different solution is needed for 64-bit

paged virtual address spaces

– One such solution is the inverted page table.

Figure 4-13. Comparison of a traditional page table with an inverted page table.


• The inverted page table (IPT) is best thought of as an off-

chip extension of the TLB which uses normal system

RAM. Unlike a true page table, it is not necessarily able to

hold all current mappings. The OS must be prepared to

handle misses, just as it would with a MIPS-style software-

filled TLB.

• The IPT combines a page table and a frame table into one

data structure. At its core is a fixed-size table with the

number of rows equal to the number of frames in memory.

If there are 4000 frames, the inverted page table has 4000

rows. For each row there is an entry for the virtual page

number (VPN), the physical page number (not the physical

address), some other data and a means for creating a

collision chain, as we will see later.


• To search through all entries of the core IPT structure is

inefficient, so we use a hash table mapping virtual

addresses (and address space/PID information if need be)

to an index in the IPT - this is where the collision chain is

used.

• This hash table is known as a hash anchor table.

– The hashing function is not generally optimized for coverage - raw speed

is more desirable.

– Of course, hash tables experience collisions.

– Due to this chosen hashing function, we may experience a lot of collisions

in usage, so for each entry in the table the VPN is provided to check if it

is the searched entry or a collision.


• In searching for a mapping, the hash anchor table is used.

If no entry exists, a page fault occurs.

– Otherwise, the entry is found.

• Depending on the architecture, the entry may be placed in

the TLB again and the memory reference is restarted, or

the collision chain may be followed until it has been

exhausted and a page fault occurs.

• A virtual address in this schema could be split into two,

– the first half being a virtual page number and the second half being the

offset in that page.


• A major problem with this design is poor cache locality

caused by the hash function.

• Tree-based designs avoid this by placing the page table

entries for adjacent pages in adjacent locations, but an

inverted page table destroys spatial locality of reference

by scattering entries all over.

• An operating system may minimise the size of the hash

table to reduce this problem, with the tradeoff being an

increased miss rate.

Page Replacement Algorithms

• Optimal replacement

• Not recently used (NRU) replacement

• First-in, first-out (FIFO) replacement

• Second chance replacement

• Clock page replacement

• Least recently used (LRU) replacement


• Page replacement algorithms decide which memory pages

to page out (swap out, write to disk) when a page of

memory needs to be allocated.

• Paging happens when a page fault occurs and a free page

cannot be used to satisfy the allocation, either because

there are none, or because the number of free pages is

lower than some threshold.

• When the page that was selected for replacement and

paged out is referenced again it has to be paged in (read

in from disk), and this involves waiting for I/O completion.

– This determines the quality of the page replacement algorithm: the less

time waiting for page-ins, the better the algorithm.


• A page replacement algorithm looks at:

– the limited information about accesses to the pages provided by hardware,

– and tries to guess which pages should be replaced to minimize the total

number of page misses,

– while balancing this with the costs (primary storage and processor time)

of the algorithm itself.

The theoretically optimal page

replacement algorithm • The theoretically optimal page replacement algorithm (also

known as OPT, clairvoyant replacement algorithm, or

Bélády's optimal page replacement policy) is an algorithm

that works as follows:

– when a page needs to be swapped in, the operating system swaps out the

page whose next use will occur farthest in the future.

– For example, a page that is not going to be used for the next 6 seconds will

be swapped out over a page that is going to be used within the next 0.4

seconds.

• This algorithm cannot be implemented in the general

purpose operating system because it is impossible to

compute reliably how long it will be before a page is going

to be used, except when all software that will run on a

system is either known beforehand and is amenable to the

static analysis of its memory reference patterns, or only a

class of applications allowing run-time analysis.

Not recently used • At a certain fixed time interval, the clock interrupt triggers

and clears the referenced bit of all the pages, so only pages

referenced within the current clock interval are marked with

a referenced bit. When a page needs to be replaced, the

operating system divides the pages into four classes:

0. not referenced, not modified

1. not referenced, modified

2. referenced, not modified

3. referenced, modified

• Although it does not seem possible for a page to be not

referenced yet modified, this happens when a class 3 page

has its referenced bit cleared by the clock interrupt.

– The NRU algorithm picks a random page from the lowest category for

removal. Note that this algorithm implies that a modified (within clock

interval) but not referenced page is less important than a not modified page

that is intensely referenced.

First-in, first-out • The simplest page-replacement algorithm is a FIFO

algorithm.

• The first-in, first-out (FIFO) page replacement algorithm is a

low-overhead algorithm that requires little book-keeping on

the part of the operating system.

• The idea is obvious from the name - the operating system

keeps track of all the pages in memory in a queue, with the

most recent arrival at the back, and the earliest arrival in

front.

• When a page needs to be replaced, the page at the front of

the queue (the oldest page) is selected.

– While FIFO is cheap and intuitive, it performs poorly in practical

application. Thus, it is rarely used in its unmodified form. This algorithm

experiences Bélády's anomaly.

Second-chance • A modified form of the FIFO page replacement algorithm,

known as the Second-chance page replacement algorithm,

fares relatively better than FIFO at little cost for the

improvement.

• It works by looking at the front of the queue as FIFO does,

but instead of immediately paging out that page, it checks

to see if its referenced bit is set.

– If it is not set, the page is swapped out.

– Otherwise, the referenced bit is cleared, the page is inserted at the back of

the queue (as if it were a new page) and this process is repeated.

– This can also be thought of as a circular queue.

– If all the pages have their referenced bit set, on the second encounter of the

first page in the list, that page will be swapped out, as it now has its

referenced bit cleared.

– If all the pages have their reference bit set then second chance algorithm

degenerates into pure FIFO.

Second Chance Replacement

Figure 4-14. Operation of second chance. (a) Pages sorted

in FIFO order. (b) Page list if a page fault occurs at time 20

and A has its R bit set. The numbers above the pages are

their loading times.

Clock • Clock is a more efficient version of FIFO than Second-

chance because pages don't have to be constantly pushed

to the back of the list, but it performs the same general

function as Second-Chance.

• The clock algorithm keeps a circular list of pages in

memory, with the "hand" (iterator) pointing to the last

examined page frame in the list.

• When a page fault occurs and no empty frames exist, then

the R (referenced) bit is inspected at the hand's location.

• If R is 0, the new page is put in place of the page the "hand"

points to, otherwise the R bit is cleared.

• Then, the clock hand is incremented and the process is

repeated until a page is replaced.

Clock Page Replacement

Figure 4-15. The clock page replacement algorithm.

Least recently used • The least recently used page (LRU) replacement algorithm,

though similar in name to NRU, differs in the fact that LRU

keeps track of page usage over a short period of time, while

NRU just looks at the usage in the last clock interval.

• LRU works on the idea that pages that have been most

heavily used in the past few instructions are most likely to

be used heavily in the next few instructions too.

• While LRU can provide near-optimal performance in theory

(almost as good as Adaptive Replacement Cache), it is

rather expensive to implement in practice.

• There are a few implementation methods for this algorithm

that try to reduce the cost yet keep as much of the

performance as possible.

Least recently used • The most expensive method is the linked list method, which

uses a linked list containing all the pages in memory.

• At the back of this list is the least recently used page, and

at the front is the most recently used page.

• The cost of this implementation lies in the fact that items in

the list will have to be moved about every memory

reference, which is a very time-consuming process.

• Another method that requires hardware support is as

follows: suppose the hardware has a 64-bit counter that is

incremented at every instruction.

– Whenever a page is accessed, it gains a value equal to the counter at the

time of page access.

– Whenever a page needs to be replaced, the operating system selects the

page with the lowest counter and swaps it out.

– With present hardware, this is not feasible because the OS needs to

examine the counter for every page in memory.

Simulating LRU in Software (1)

Figure 4-16. LRU using a matrix when pages are referenced in the

order 0, 1, 2, 3, 2, 1, 0, 3, 2, 3.

Read through pg 401 - 403

Simulating LRU in Software (2)

Figure 4-17. The aging algorithm simulates LRU in software.

Shown are six pages for five clock ticks. The five clock ticks are

represented by (a) to (e).

Design Issues for Paging Systems

• Knowing the bare mechanics of paging is not

enough.

• To design a system, you have to know a lot more

to make it work well.

• In the following sections, we will look at other

issues that OS designers must consider in order to

get good performance from a paging system.

The Working Set Model • The working set of a process is the set of pages expected

to be used by that process during some time interval.

• The "working set model" isn't a page replacement algorithm

in the strict sense (it's actually a kind of medium-term

scheduler)

• Working set is a concept in computer science which defines

what memory a process requires in a given time interval.

The Working Set Model • The working set of information W(t, tau) of a process at time

t to be the collection of information referenced by the

process during the process time interval (t - tau, t).

• Typically the units of information in question are considered

to be memory pages.

• This is suggested to be an approximation of the set of

pages that the process will access in the future (say during

the next tau time units), and more specifically is suggested

to be an indication of what pages ought to be kept in main

memory to allow most progress to be made in the execution

of that process.

The Working Set Model • The effect of choice of what pages to be kept in main

memory (as distinct from being paged out to auxiliary

storage) is important:

– if too many pages of a process are kept in main memory, then fewer other

processes can be ready at any one time.

– If too few pages of a process are kept in main memory, then the page fault

frequency is greatly increased and the number of active (non-suspended)

processes currently executing in the system approaches zero.

• The working set model states that a process can be in RAM

if and only if all of the pages that it is currently using (often

approximated by the most recently used pages) can be in

RAM.

• The model is an all or nothing model, meaning if the pages

it needs to use increases, and there is no room in RAM, the

process is swapped out of memory to free the memory for

other processes to use.

The Working Set Model • Often a heavily loaded computer has so many processes

queued up that, if all the processes were allowed to run for

one scheduling time slice, they would refer to more pages

than there is RAM, causing the computer to "thrash".

• By swapping some processes from memory, the result is

that processes -- even processes that were temporarily

removed from memory -- finish much sooner than they

would if the computer attempted to run them all at once.

• The processes also finish much sooner than they would if

the computer only ran one process at a time to completion,

– since it allows other processes to run and make progress during times that

one process is waiting on the hard drive or some other global resource.

• In other words, the working set strategy prevents thrashing

while keeping the degree of multiprogramming as high as

possible. Thus it optimizes CPU utilization and throughput.

The Working Set Model

• Thrashing?

– describes a computer whose virtual memory subsystem is in a constant state

of paging, rapidly exchanging data in memory for data on disk, to the

exclusion of most application-level processing.

– This causes the performance of the computer to degrade or collapse. The

situation may not resolve itself quickly, but can continue indefinitely until

the underlying cause is addressed.

The Working Set Model

Figure 4-18. The working set is the set of pages used by the k

most recent memory references. The function w(k, t) is the size of

the working set at time t.

k

Local versus Global Allocation Policies • In the preceding sections we have discussed

several algorithms for choosing a page to replace

when a fault occurs.

• A major issue associated with this choice is how

memory should be allocated among the competing

runnable processes.

• Local algorithms:

– Allocate every process a fixed fraction of memory.

• Global algorithms:

– Dynamically allocate page frames among the runable

processes

– Thus the number of page frames assigned to each process

varies in time.

Page Fault Frequency

Figure 4-20. Page fault rate as a function of the

number of page frames assigned.

Page Size

• The page size is often a parameter that can

be chosen by the OS.

• Determining the best page size requires

balancing several competing factors.

• As a result, there is no overall optimum.

Virtual Memory Interface

• The use of virtual memory addressing (such as paging or

segmentation) means that the kernel can choose what

memory each program may use at any given time, allowing

the operating system to use the same memory locations for

multiple tasks.

• If a program tries to access memory that isn't in its current

range of accessible memory, but nonetheless has been

allocated to it, the kernel will be interrupted in the same way

as it would if the program were to exceed its allocated

memory.

• Under UNIX this kind of interrupt is referred to as a page

fault.

Distributed Shared Memory

• Distributed Shared Memory (DSM) is a form of

memory architecture where the (physically

separate) memories can be addressed as one

(logically shared) address space.

• Here, the term shared does not mean that there is

a single centralised memory but shared essentially

means that the address space is shared (same

physical address on two processors refers to the

same location in memory)

Segmentation

• The virtual memory discussed so far is one-

dimensional because the virtual addresses go from

0 to some maximum address, one address after

another.

• For many problems, having two or more separate

virtual address spaces may be much better than

having only one.

• For example, a compiler has many tables that are

built up as compilation proceeds…

Segmentation (1)

Examples of tables saved by a compiler …

1. The source text being saved for the printed listing (on batch systems).

2. The symbol table, containing the names and attributes of variables.

3. The table containing all the integer and floating-point constants used.

4. The parse tree, containing the syntactic analysis of the program.

5. The stack used for procedure calls within the compiler.

These will vary in size dynamically during the compile process

Segmentation (2)

Figure 4-21. In a one-dimensional address space with growing

tables, one table may bump into another.

• Each of the first four tables

grows continuously as

compilation proceeds.

• The last one grows and shrinks

in unpredictable ways during

compilation.

• In a one-dimensional memory,

these five tables would have to

be allocated neighbouring

chunks of virtual address space

Segmentation

• Consider what happens if a program has an

exceptionally large number of variables but a

normal amount of everything else.

• The chunk of address space allocated for the

symbol table may fill up, but there may be lots of

room in the other tables.

• A straightforward and extremely general solution is

to provide the machine with many completely

independent address spaces, called segments.

• Each segment consists of a linear sequence of

addresses, from 0 to some maximum.

Segmentation (3)

Figure 4-22. A segmented memory allows each table to grow or

shrink independently of the other tables.

Segmentation (4)

Figure 4-23. Comparison of paging and segmentation.

. . .

Segmentation (4)

Figure 4-23. Comparison of paging and segmentation.

. . .

Implementation of Pure Segmentation

• The implementation of segmentation differs from

paging in an essential way:

– Pages as fixed size and segments are not.

• Figure 4-24(a) shows an example of physical

memory initially containing five segments.

– Now consider what happens if segment 1 is evicted and

segment 7, which is smaller, is put in its place.

– We arrive at the memory configuration of (b).

– Between segment 7 and segment 2 is an unused area – a hole.

– Then segment 4 is replaced by segment 5 (as in (c))

– And segment 3 is replaced by segment 6, as in (d).

Implementation of Pure Segmentation

Figure 4-24. (a)-(d) Development of checkerboarding.

(e) Removal of the checkerboarding by compaction.

• After the system has been running for a while, memory will be

divided up into a number of chunks, some containing segments and

some containing holes.

• This phenomenon, called checker-boarding or external

fragmentation, wastes memory in the holes (can be dealt with by

compaction (e)).

Segmentation with Paging:

The Intel Pentium

See p415 - 420

Overview of the MINIX 3 Process

Manager • Memory management in MINIX 3 is simple:

– Paging is not used at all.

– Memory management doesn’t include swapping either.

– MINIX 3 works on a system with limited physical memory.

– In practice, memories are so large now that swapping is rarely needed.

• A user-space server designated the process manager (or

PM) does, however, exist.

– It handles system calls relating to process management.

– Of these some are intimately involved with memory management.

– Process management also includes processing system calls related to

signals, setting and examining process properties such as user and group

ownership, and reporting CPU usage times.

– The MINIX 3 process manager also handles setting and querying the real

time clock.

Memory Layout

• In normal MINIX 3 operation, memory is allocated on two

occasions.

– First, when a process forks (the amount of memory needed by the child is

allocated).

– Second, when a process changes its memory image via the exec system

call, the space occupied by the old image is returned to the free list as a

hole, and memory is allocated for the new image.

• The new image may be in a part of memory different from

the released memory

• Its location will depend upon where an adequate hole is

found.

• Memory is also released whenever a process terminates,

either by exiting or by being killed by a signal.

Memory Layout (1)

Figure 4-30. Memory allocation (a) Originally. (b) After a fork.

(c) After the child does an exec. The shaded regions are unused memory. The process is a common I&D one.

• The figure shows memory allocation during a fork and an exec. In

(a) we see two processes, A and B, in memory.

• If A forks, we get the situation of (b). The child is an exact copy of

A.

• If the child now execs the file C, the memory looks like ©.

• The child’s image is replaced by C.

Memory Layout (2)

Figure 4-31. (a) A program as stored in a disk file. (b) Internal

memory layout for a single process. In both parts of the

figure the lowest disk or memory address is at the bottom and

the highest address is at the top.

The data part of the image is enlarged by the

amount specified in the bss field in the header

Message Handling

• The process manager is message driven.

• After the system has been initialised

– PM enters its main loop, which consists of waiting for a message, carrying

out the request contained in the message, and sending a reply.

• Two message categories may be received by the process

manager.

– For high priority communication between the kernel and system servers

such as PM, a system notification message is used

(these are special cases)

– The majority of messages received by the process manager result from

system calls originated by user processes.

• For this category, the next figure gives a list of legal message types,

input parameters and values sent back in the reply message.

Process Manager Data Structures

and Algorithms (1)

Figure 4-32. The message types, input parameters, and reply

values used for communicating with the PM.

. . .

Process Manager Data Structures

and Algorithms (2)

Figure 4-32. The message types, input parameters, and reply

values used for communicating with the PM.

. . .

Processes in Memory and Shared Text

• See p428-431

• ―The PM’s process table is called mproc and its

definition is given in src/servers/om/mproc.h‖

• It contains all the fields related to a process’

memory allocation, as well as some additional

items.

• The most important field is the array mp_seg

• Etc…

The Hole List

Figure 4-35. The hole list is an array of struct hole.

• The other major process manager data structure is the hole table,

hole, defined in src/servers/pm/alloc.c, which lists every hole in

memory in order of increasing memory address.

• The gaps between the data and stack segments are not considered

holes; they have already been allocated and procssed.

FORK System Call • When processes are created or destroyed, memory must be

allocated or deallocated.

• Also, the process table must be updated, including the parts held by

the kernel and FS.

• The PM coordinates this activity.

Figure 4-36. The steps required to carry out the fork system call.

EXEC System Call (1)

Figure 4-37. The steps required to carry out the exec system call.

• EXEC is the most complex system call in MINIX 3.

• It must replace the current memory image with a new one,

including setting up a new stack.

• The new image must be a binary executable file, of course.

• Exec carries out its job in a series of steps:

Signal Handling (1)

Figure 4-40. Three phases of dealing with signals.

See p438 - 446

Signal Handling (2)

Figure 4-41. The sigaction structure.

Signal

Handling

(3)

Figure 4-42. Signals defined by POSIX and MINIX 3. Signals indicated by (*) depend on hardware support. Signals marked (M) not defined by POSIX, but are defined by MINIX 3 for compatibility with older programs. Signals kernel are MINIX 3 specific signals generated by the kernel, and used to inform system processes about system events. Several obsolete names and synonyms are not listed here.

Signal

Handling

(4)

Figure 4-42. Signals defined by POSIX and MINIX 3. Signals indicated by (*) depend on hardware support. Signals marked (M) not defined by POSIX, but are defined by MINIX 3 for compatibility with older programs. Signals kernel are MINIX 3 specific signals generated by the kernel, and used to inform system processes about system events. Several obsolete names and synonyms are not listed here.

IMPLEMENTATION OF THE

MINIX 3 PROCESS MANAGER

Read Through and Generally Grasp

Detail on p447 - 475

Education

Ch4 memory management