When pages are updated

1

When pages are updated

제 52 강 : Swapping & Flushing

2

Main memory Disk

blo

ck d

evic

e

textdata

a.out

3

Main memory

Loaded

(scattered)

Disk

blo

ck d

evic

e

textdata

a.outtext

data

X

RW

4

Main memory

Loaded

(scattered)

Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X

RW

RW

5

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written

writtenRW

RW

6

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written

writtenwrite-back

RW

RW

7

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written

write-backImmediately

orflush

periodically

RW

RW

8

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written


orflush

periodically

RW

RW

Page Cacheclean

dirty

locked

address_space struct

9

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X

RW

always clean

written (no disk counterpart)

written

RW

10

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written

writtenwrite-back

or flush

swap

are

a

stack

RW

RW swap(save) this page

for later use

11

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X always clean

written

written

swap

are

a

stack

RW

RW

“Anonymous page”such as stack, heap

does not map to a file on disk

swap(save) this page

for later use

write-backor flush

12

13

Page Frame Reclamation

• kernel refills free-block-list • before all free memory are gone• PFRA(Page Frame Reclaiming

Algorithm)– select page frame, make this page free

• Target pages can belong to – user-mode-processes or– kernel caches (slab layer)

• If dirty – write or swap

14

Writing out Dirty Pages

Chapter 15, Love’s book

15

Writing out Dirty Pages

Chapter 15, Love’s book

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X

written


orflush

periodically

RW

RW

16

pdflush Daemon

• Dirty page writeback occurs when– No free memory (below a specified

threshold)– Dirty data became too old (older than a specific

threshold)

Page Cacheclean

dirty

locked

– setting pdflush daemon• dirty_background_ratio (free mem < d_b_r)• dirty_expire_centisecs (mod_time > d_e_c)• dirty_ratio (no. of dirty page > d_r)• dirty_writeback_centisecs (cycle time of pdflush)

17

Swapping out Anonymous Pages

18

Swapping out Anonymous Pages

Main memory Disk

blo

ck d

evic

e

textdata

a.outtext

data

stack

X

written

writtenwrite-back

or flush

swap

are

a

stack

RW

RW swap(save) this page

for later use

19

Linux swapping• Pages like stack/heap cannot be discarded (used later) • They have to be copied to backing store, called swap area • Strictly speaking, Linux does not swap, because – 'swapping‘ means copying entire process address space to disk – 'paging' means copying out individual pages

• Linux actually implements paging (traditionally called it swapping)

• Linux swapping page frame reclaiming

Meomory

pagepageswap area

20

Swap Area in Disk (p 179 Gorman)

• Multiple swap areas system administrator spreads load among

several disks Faster swap areas (faster disk) may have higher

priority swapping may start from faster swap area multiple swap area may read/write concurrently

• each active swap area is a file or partition (max 32 swap areas)

• Each swap area is divided up into page-sized slots on disk.

swap area (file or partition)

slot ( page)

21

Swap Area in Disk (p 179 Gorman)

• Multiple swap areas system administrator spreads load among

several disks Faster swap areas (faster disk) may have higher

priority swapping may start from faster swap area multiple swap area may read/write concurrently

• each active swap area is a file or partition (max 32 swap areas)

• Each swap area is divided up into page-sized slots on disk.

struct swap_info_struct { unsigned int flags; spinlock_t sdev_lock; struct file *swap_file; struct block_device

*bdev; };

swap_info[]01

31


slot ( page)

struct swap_info_struct {};

22

Swapping Subsystem

• PTE keeps track of the positions of data in swap area

swap_info_struct { flags; sdev_lock; *swap_file; *bdev; };

swap_info[]01

31


PTE01

31

Pk was swapped out

k indexindex

swap_info_struct

23

Swap in – Race Problem• swap in can cause race condition • Example [2 process case]• 1st process accesses page X and page

faults– kernel tries to swap in– allocate a new page frame– start I/O operation

• 2nd process accesses page X and page faults– kernel tries to swap in– allocate a new page frame– start I/O operation

24

Swap out – Race Problem• Swap out can cause race condition• Example [N process share a page]

PTE

PTE

PTE

PTE

PA

PB

PC

PD

pageX

CPU1

CPU2

CPU3

CPU4

25

Swap out – Race Problem• Swap out can cause race condition• Example [N process share a page]

– 1st process swap out page X – Other processes access page X

PTE

PTE

PTE

PTE

PA

PB

PC

PD

pageX

CPU1

CPU2

CPU3

CPU4

PTE

PTE

PTE

PTE

PA

PB

PC

PD

pageX

CPU1

CPU2

CPU3

CPU4

pageX

26

If a page is shared, a special entry (swap entry) is allocated “swap out” just decrement reference count in swap entry. Only when the count reaches zero will the page be freedPages like this are considered to be in the swap cacheswap cache is implemented by page cache data structure swap cache is purely conceptual because it’s simply

specialization of page cache

Linux Solution – Swap Cache

Swapping out pages

PTE

PTE

PA

PB

pageX

Count 2

Swap Cache

27


Swapping out pages

PTE

PTE

PA

PB

pageX

Count 1

Swap Cache

XPTE

PTE

PA

PB

pageX

Count 2

Swap Cache

• If a page is shared, a special entry (swap entry) is allocated • “swap out” just decrement reference count in swap entry. • Only when the count reaches zero will the page be freed• Pages like this are considered to be in the swap cache• swap cache is implemented by page cache data structure • swap cache is purely conceptual because it’s simply specialization

of page cache

28


Swapping out pages

PTE

PTE

PA

PB

pageX page

X

Count 2

PTE

PTE

PA

PB

pageX

Count 1

PTE

PTE

PA

PB

pageX

Count 0

Swap Cache Swap Cache Swap Cache

Swap Area

X XX

• If a page is shared, a special entry (swap entry) is allocated • “swap out” just decrement reference count in swap entry. • Only when the count reaches zero will the page be freed• Pages like this are considered to be in the swap cache• swap cache is implemented by page cache data structure • swap cache is purely conceptual because it’s simply specialization

of page cache

29

Page Fault (Love’s book – Chapter 14)

30

/* This routine handles page faults. */asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code){ struct task_struct *tsk; struct mm_struct *mm; struct vm_area_struct * vma; unsigned long address; unsigned long page;

/* get the address */ __asm__("movl %%cr2,%0":"=r" (address)); tsk = current; mm = tsk->mm; down_read(&mm->mmap_sem); vma = find_vma(mm, address); if (vma->vm_start <= address) goto good_area; …..good_area:switch (error_code & 3) { default: /* 3: write, present */

/* fall through */ case 2: /* write, not present */ if (!(vma->vm_flags & VM_WRITE)) goto bad_area; write++; break; case 1: /* read, present */ goto bad_area; case 0: /* read, not present */ if (!(vma->vm_flags & (VM_READ | VM_EXEC))) goto bad_area; }

mm field

task_struct mm_struct

mm

tty

files

fs

VMA - text

VMA - data

VMA – stack

mmap pgd

vm_area_struct

vm_area_struct

vm_area_structPTE

PTE

Directory

start_addressend_addresspermissionfileoperations page fault() add_vma remove_vma

31

thread_info

kernel stack

task_struct

mm_structmm

tty

fs

files

pgd

PTE

PTE

DirectoryCPU SP

filp cach

e

Inode

cache

dentrycache

space Manager slab

unitslab

page

method ()

inode

page

page

address_space

cleandirty pageslocked

addressspace

T

L

S

D

mmap

VMA

VMA

VMA

LRU list

page

page

startendfilenopage()

32

Thank You

Documents

When pages are updated