CS 443 Advanced OS David R. Choffnes, Spring 2005 Practical, transparent operating system support...

Preview:

Citation preview

CS 443 Advanced OS

David R. Choffnes, Spring 2005

Practical, transparent operating system support

for superpages

Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox

(Rice University)

Appears in: Fifth Symposium on Operating Systems Design and Implementation

(OSDI 2002)

Presented by: David R. Choffnes

2

Outline

The superpage problem

Related Approaches

Design

Implementation

Evaluation

Conclusion

3

Introduction

TLB coverage– Definition– Effect on performance

Superpages– Wasted memory– Fragmentation

Contribution– General, transparent superpages– Deals with fragmentation– Contiguity-aware page replacement algo– Demotion/Eviction of dirty superpages

4

The Superpage Problem

Factor of 1000 decrease in

15 years

TLB miss overhead:

5% 5-10%

30%

TLB coverage trend

TLB coverage of % of main memory

5

The Superpage Problem

Increasing TLB coverage– More TLB entries is expensive– Larger page size leads to internal fragmentation

and increased I/O– Solution: use multiple page sizes

Superpage definition

Hardware-imposed constraints– Finite set of page sizes (subset of powers of 2)– Contiguity– Alignment

6

A superpage TLB

base page entry (size=1)

superpage entry (size=4)superpage entry (size=4)

physical memory

virtual memory

virtualaddress

TLB

physicaladdress

Alpha: 8,64,512KB; 4MB

Itanium:4,8,16,64,256KB; 1,4,16,64,256MB

7

Superpage Issues and Tradeoffs

Allocation– Relocation– Reservation

8

Issue 1: superpage allocationIssue 1: superpage allocation

virtual memory

physical memory

superpage boundaries

B

B

A

A

C

C

D

D A B C D

How / when / what size to allocate?How / when / what size to allocate?

9

Superpage Issues (Cont.)

Promotion– Incremental– Timing (not too soon, not too late)

Demotion and Eviction– Hardware reference and dirty bit limitation

10

Issue 2: promotion

Promotion: create a superpage out of a set of smaller pages– mark page table entry of each base page

When to promote?

Create small superpage?May waste overhead.

Wait for app to touch pages? May lose opportunity to increase

TLB coverage.

Forcibly populate pages?May cause internal fragmentation.

11

Superpage Issues: Fragmentation

Fragmentation– Memory becomes fragmented due to

• use of multiple page sizes• persistence of file cache pages• scattered wired (non-pageable) pages

– Contiguity as contended resource

12

Related Approaches

HP-UX and IRIX Reservations– Not transparent

Page Relocation– Used exclusively, leads to lower performance due

to increased TLB misses

Hardware Support– Talluri and Hill: Remove contiguity requirement

This approach: Hybrid reservation and relocation system with page replacement that

biases toward pages that contribute to contiguity

13

Design

Reservation-based superpage management

Multiple superpage sizes

Demotion of sparsely referenced superpages

Preservation of contiguity w/o compaction

Efficient disk I/O for partially modified SPs

Uses buddy allocator for contiguous regions

14

Key observation

Once an application touches the first page of a memory object then it is likely that it will

quickly touch every page of that object

Example: array initialization

Opportunistic policies– superpages as large and as soon as possible– as long as no penalty if wrong decision

15

Reservations

Set of frames initially reserved at page fault– Fixed-size objects: largest aligned superpage that

is not larger than the object– Dynamic objects: same as fixed, but reservation is

allowed to extend beyond the end of the object

Preemption– If no available memory for allocation request,

system will preempt the reservation whose most recent page allocation occurred least recently

16

Managing reservations

largest unused (and aligned) chunk

best candidate for preemption at front:best candidate for preemption at front: reservation whose most recently populated reservation whose most recently populated

frame was populated the least recentlyframe was populated the least recently

1

2

4

17

Other Design Issues

Fragmentation control– Coalescing– Contiguity-aware page replacement

Incremental promotions– Occurs as soon as a superpage region is fully

populated

Speculative demotion– Occurs on eviction (recursively)– Occurs on first write to clean superpage

• Overhead too high for hash digests

– Daemon periodically demotes pages speculatively• Necessary due to reference bit limitation

18

Incremental promotions

Promotion policy: opportunistic

2

4

4+2

8

19

More Design Issues

Multi-list reservation scheme– One list of each page size supported by hardware– Reservations sorted by allocation recency– Preemption removes from head of list

• Reservation recursively broken into extents• Fully populated extents are not put in reservation lists

Population map– Reserved frame lookup– Overlap avoidance– Promotion decisions– Preemption assistance

20

Implementation Notes

FreeBSD uses three lists of pages in A-LRU order: active, inactive, cache

Contiguity-aware page daemon– Cache considered available for allocation– Daemon activated when contiguity falls low– Clean file-backed pages moved to inactive as

soon as file is closed

Wired page clustering

Multiple mappings

21

Evaluation

Setup– FreeBSD 4.3– Alpha 21264, 500 MHz, 512 MB RAM– 8 KB, 64 KB, 512 KB, 4 MB pages– 128-entry DTLB, 128-entry ITLB– Unmodified applications

22

Best-Case Results

TLB miss reduction usually above 95%

SPEC CPU2000 integer– 11.2% improvement (0 to 38%)

SPEC CPU2000 floating point– 11.0% improvement (-1.5% to 83%)

Other benchmarks– FFT (2003 matrix): 55%– 1000x1000 matrix transpose: 655%

30%+ in 8 out of 35 benchmarks

23

Benefits of multiple page sizes

Speedups TLB Miss Reduction

24

Sustained benefits

Use Web server to fragment memory, then use FFTW to see how quickly memory is reclaimed

FFTW reaches a speedup of almost 55%, Web server performance degrades only 1.6% on successive run

Concurrent execution: only 3% degradation with modified page daemon

25

Fragmentation control

time0

.2

.4

.6

.8

10min

normalized contiguity of free memory

no frag control

web server FFT FFTFFT FFT

no speedupfull speedup

partial speedup

web server FFT FFT FFT FFT

frag control

26

Adversary applications

Incremental promotion– Slowdown of 8.9%, 7.2% is hardware-specific

Sequential access– 0.1% degradation

Preemption– 1.1% degradation

General overhead– Use superpage supporting mechanisms, but don’t

promote: 1-2% performance degradation

27

Cetera

Dirty Superpages– Performance penalty of not demoting is a factor of

20

Scalability– Most operations O(1), O(S) or O(S*R)– Daemon, promotion, demotion and dirty/reference

bit emulation are linear• Promotion/Demotion is amortized to O(S) for programs

the need to change page size only early in life• Dirty/Reference bits: Motivates the need for clustered

page tables either in OS or HW

28

Conclusion

Effective, transparent and efficient support for superpages

Demonstrates effectiveness of multiple page sizes

Improved performance for nearly all applications

Minimal overhead

Scalable to large numbers of page sizes

Recommended