23
SpecTLB: A Mechanism for Speculative Address Translation 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

Embed Size (px)

Citation preview

Page 1: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

SpecTLB: A Mechanism for Speculative Address

Translation

2012. 06. 13Miseon Han

Thomas W. Barr, Alan L. Cox, Scott RixnerRice Computer Architecture Group, Rice Uni-

versityISCA, June 2011

Page 2: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

Motivation

Page 3: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Virtual memory– Performance overhead 5-14% for ‘typical’ applications [Bhargava08]– 89% under virtualization [Bhargava08]– Large pages not always a good solution

Virtual Memory: Still an increasing challenge

3

Page 4: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• What page size to pick?– 4KB, 2MB, 1GB on x86

• Can’t always use largest size– Wasted memory– increased I/O traffic

• Dynamic page size selection

Physical memory allocator – Large pages

4

Page 5: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• SpecTLB (Speculative TLB)– A hardware/software system

• Reservation-based physical memory allocator [Talluri94]– Allocate small pages by default to maintain fine-grained control

• Predict small page translations in hardware– Performance of large pages, control of small pages

Ideas

5

Page 6: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

Background

Page 7: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Four-level radix-tree page table

X86-64 Page Table format

7

0x5c8315cc2016

[47:39] [38:30] [29:21] [20:12] [11:0] {0b9, 00c, 0ae, 0c2, 016}

{123, 016}

Page 8: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Page table levels describe physical address space at different granularity

Large pages

8

512GB 1GB 2MB 4KB

Page 9: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Reservation-based memory allocation [Talluri94]– Always allocate small pages in book-keeping entry at first– Place these small pages in a large page ‘reservation’

• if the handler decides that reservation is needed– Promote reservation to large page

• when all small pages in the reservation are allocated– Extended and implemented in FreeBSD [Navarro02]

• Default memory allocator

Reservation-based memory allocation

9

Page 10: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Reservation-based memory allocation

10

Handler reserves2MB region of physical space

Page 11: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Reservation-based memory allocation

11

Reservation is ‘promoted’ intoLarge page.

Page 12: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Reservation-based memory allocation

12

Reservations may not be filled.

Page 13: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Reservation based memory allocation

13

Page 14: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

SpecTLB

Page 15: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• TLB-like structure– Tracks reservations, not actual mappings– Detect reservations– Predict translations– Verify predictions

SpecTLB

15

Page 16: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Detecting reservations

16

{0b9, 00c, 0ae, 002, 313} {8002, 313}

Virtual Address Physical Address

{0b9, 00c, 0ae, 000, 000}

{8000, 000}Current Reservations:

{8000, 000}

Page 17: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Predicting translations

17

{0b9, 00c, 0ae, 005, 313} {8005, 313}?

Virtual Address Physical Address

{0b9, 00c, 0ae, 000, 000}

{8000, 000}Current Reservations:

{8000, 000}

?

Page 18: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Provides predicted translations for pages within tracked reservations

• Predictions may be incorrect– Page table must still be walked

• Page walk can occur in parallel• Latency hidden

– Speculative translation can be used concurrently• Microarchitecture cancels speculative work

SpecTLB

18

Page 19: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

Simulation & Result

Page 20: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

Benchmark TLB miss rate(/1k DRAM accesses)

Speculative Prediction frequency

Prediction Accuracy

DRAM Ac-cesses Overlapped

PostgreSQL 74.43 0.762 0.989 0.448

python 15.36 0.760 0.998 0.419

SPECjbb 20.04 0.418 0.971 0.310

bzip2 4.00 0.293 0.998 0.235

gcc 4.25 0.852 0.988 0.664

mcf 79.43 0.992 1.000 0.956

dc.B 42.29 0.083 0.353 0.073

ep.C 12.94 0.014 0.962 0.023

SpecTLB Results

20

Full system simulator, unmodified FreeBSD kernel

Page 21: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• SpecTLB and TLB prefetching hide the latency of TLB misses.– SpecTLB : large-page reservations. current TLB miss.– TLB prefetcher : access patterns, future TLB miss.

• Speculative work– SpecTLB : instructions are executed parallel with translation confirm.– TLB prefetcher : prefetch page table entries.

TLB Prefetcher Comparison

21

Page 22: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• Generally hidesfewer walks thanSpecTLB– Prefetcher does

well with high access regularity

TLB Prefetcher Comparison

22

Bench-mark

TLB miss rate

SpecTLB TLB Prefetcher

Post-greSQL

74.43 0.989 0.106

python 15.36 0.998 0.633

SPECjbb 20.04 0.971 0.151

bzip2 4.00 0.998 0.978

gcc 4.25 0.988 0.330

mcf 79.43 1.000 0.051

dc.B 42.29 0.353 0.190

ep.C 12.94 0.962 0.897

Page 23: 2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

http://compiler.korea.ac.kr

• SpecTLB hides latency of TLB misses– Predictions allow page walk to occur in parallel with speculative work– >62% of TLB miss latencies hidden for majority of benchmarks

Conclusions

23