19
Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar

Shared Last-Level TLBs for Chip Multiprocessors

  • Upload
    monita

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Shared Last-Level TLBs for Chip Multiprocessors. Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011. Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar. Translation Lookaside Buffer. Contribution. SLL TLB design explored for the first time - PowerPoint PPT Presentation

Citation preview

Page 1: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs for Chip

MultiprocessorsAbhishek Bhattacharjee

Daniel Lustig Margaret Martonosi

HPCA 2011

Presented by: Apostolos KotsiolisCS 7123 – Research Seminar

Page 2: Shared Last-Level TLBs for Chip Multiprocessors

Translation Lookaside Buffer

Page 3: Shared Last-Level TLBs for Chip Multiprocessors

ContributionSLL TLB design explored for the

first timeAnalyze SLL TLB benefits for

parallel programsAnalyze multi-programmed

fashion workloads consisting of sequential applications

Page 4: Shared Last-Level TLBs for Chip Multiprocessors

Previous and Related workPrivate Multilevel TLB Hierarchies

◦Intel i7, AMD K7-K8-K10, SPARC64-III◦No Sharing between cores◦Waste of resources

Inter-Core Cooperative Prefetching◦Two types of predictable misses:◦Inter-Core Shared (ICS)

Leader-Follower Prefetching◦Inter-Core Predictable Stride (ICPS)

Distance-Based Cross-Core Prefetching

Page 5: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBsExploit inter-core sharing in

parallel programsFlexible regarding where entries

can be placedBoth parallel and sequential

workloads are benefitedGreater Hit rateCPU Performance boosted

Page 6: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs

Page 7: Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs with simple Stride Prefetching

Page 8: Shared Last-Level TLBs for Chip Multiprocessors

Methodology

◦Parallel applications

◦Different Sequential application on each core

Two distinct evaluation sets

Page 9: Shared Last-Level TLBs for Chip Multiprocessors

MethodologyBenchmarks

Page 10: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus Private L2 TLBs

Page 11: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching

Page 12: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs versus ICC Prefetching

Page 13: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload Results

SLL TLBs with Simple Stride Prefetching

Page 14: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsSLL TLBs at Higher Core Counts

Page 15: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Parallel Workload ResultsPerformance Analysis

Page 16: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Multiprogrammed Workload ResultsMultiprogrammed Workloads with

One Application Pinned per Core

Page 17: Shared Last-Level TLBs for Chip Multiprocessors

SLL TLBs: Multiprogrammed Workload ResultsPerformance Analysis

Page 18: Shared Last-Level TLBs for Chip Multiprocessors

Conclusion-Benefits:On Parallel Workloads:

◦Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing

◦Outperform conventional per-core private L2 TLBs by average of 27%

◦Improve CPI up to 0.25On multiprogrammed sequential

workloads:◦Improve over private L2 TLBs by

average of 21%◦Improve CPI up to 0.4

Page 19: Shared Last-Level TLBs for Chip Multiprocessors

Thank You!Questions?

?