2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

SpecTLB: A Mechanism for Speculative Address

Translation

2012. 06. 13Miseon Han

Thomas W. Barr, Alan L. Cox, Scott RixnerRice Computer Architecture Group, Rice Uni-

versityISCA, June 2011

Motivation

http://compiler.korea.ac.kr

• Virtual memory– Performance overhead 5-14% for ‘typical’ applications [Bhargava08]– 89% under virtualization [Bhargava08]– Large pages not always a good solution

Virtual Memory: Still an increasing challenge

3


• What page size to pick?– 4KB, 2MB, 1GB on x86

• Can’t always use largest size– Wasted memory– increased I/O traffic

• Dynamic page size selection

Physical memory allocator – Large pages

4


• SpecTLB (Speculative TLB)– A hardware/software system

• Reservation-based physical memory allocator [Talluri94]– Allocate small pages by default to maintain fine-grained control

• Predict small page translations in hardware– Performance of large pages, control of small pages

Ideas

5

Background


• Four-level radix-tree page table

X86-64 Page Table format

7

0x5c8315cc2016

[47:39] [38:30] [29:21] [20:12] [11:0] {0b9, 00c, 0ae, 0c2, 016}

{123, 016}


• Page table levels describe physical address space at different granularity

Large pages

8

512GB 1GB 2MB 4KB


• Reservation-based memory allocation [Talluri94]– Always allocate small pages in book-keeping entry at first– Place these small pages in a large page ‘reservation’

• if the handler decides that reservation is needed– Promote reservation to large page

• when all small pages in the reservation are allocated– Extended and implemented in FreeBSD [Navarro02]

• Default memory allocator

Reservation-based memory allocation

9



10

Handler reserves2MB region of physical space



11

Reservation is ‘promoted’ intoLarge page.



12

Reservations may not be filled.


Reservation based memory allocation

13

SpecTLB


• TLB-like structure– Tracks reservations, not actual mappings– Detect reservations– Predict translations– Verify predictions

SpecTLB

15


Detecting reservations

16

{0b9, 00c, 0ae, 002, 313} {8002, 313}

Virtual Address Physical Address

{0b9, 00c, 0ae, 000, 000}

{8000, 000}Current Reservations:

{8000, 000}


Predicting translations

17

{0b9, 00c, 0ae, 005, 313} {8005, 313}?

Virtual Address Physical Address

{0b9, 00c, 0ae, 000, 000}

{8000, 000}Current Reservations:

{8000, 000}

?


• Provides predicted translations for pages within tracked reservations

• Predictions may be incorrect– Page table must still be walked

• Page walk can occur in parallel• Latency hidden

– Speculative translation can be used concurrently• Microarchitecture cancels speculative work

SpecTLB

18

Simulation & Result


Benchmark TLB miss rate(/1k DRAM accesses)

Speculative Prediction frequency

Prediction Accuracy

DRAM Ac-cesses Overlapped

PostgreSQL 74.43 0.762 0.989 0.448

python 15.36 0.760 0.998 0.419

SPECjbb 20.04 0.418 0.971 0.310

bzip2 4.00 0.293 0.998 0.235

gcc 4.25 0.852 0.988 0.664

mcf 79.43 0.992 1.000 0.956

dc.B 42.29 0.083 0.353 0.073

ep.C 12.94 0.014 0.962 0.023

SpecTLB Results

20

Full system simulator, unmodified FreeBSD kernel


• SpecTLB and TLB prefetching hide the latency of TLB misses.– SpecTLB : large-page reservations. current TLB miss.– TLB prefetcher : access patterns, future TLB miss.

• Speculative work– SpecTLB : instructions are executed parallel with translation confirm.– TLB prefetcher : prefetch page table entries.

TLB Prefetcher Comparison

21


• Generally hidesfewer walks thanSpecTLB– Prefetcher does

well with high access regularity

TLB Prefetcher Comparison

22

Bench-mark

TLB miss rate

SpecTLB TLB Prefetcher

Post-greSQL

74.43 0.989 0.106

python 15.36 0.998 0.633

SPECjbb 20.04 0.971 0.151

bzip2 4.00 0.998 0.978

gcc 4.25 0.988 0.330

mcf 79.43 1.000 0.051

dc.B 42.29 0.353 0.190

ep.C 12.94 0.962 0.897


• SpecTLB hides latency of TLB misses– Predictions allow page walk to occur in parallel with speculative work– >62% of TLB miss latencies hidden for majority of benchmarks

Conclusions

23

Documents

2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011