57
Akanksha Jain, Calvin Lin University of Texas at Austin Linearizing Irregular Memory Accesses for Improved Correlated Prefetching

Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

Akanksha Jain, Calvin LinUniversity of Texas at Austin

Linearizing Irregular Memory Accesses for Improved Correlated Prefetching

Page 2: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

2

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Page 3: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

3

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Pointer-based

Page 4: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

4

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Spatial Correlation

Page 5: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

5

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Temporal Correlation

Page 6: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

6

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Spatio-Temporal Prefetching

Page 7: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

7

The Problem : Irregular Prefetching

A XB C

D E

’96   ’97   ’98 ’99   ’00   ’01   ’02   ’03   ’04   ’05   ’06   ’07   ’08   ’09 ’10   ’11  Luk

Johnson

Joseph

Kumar

Roth

Carter

Chilimbi

Roth

Burger

Collins

Cooksey

Solihin

Wang

Chen

Nesbit

Nesbit

Wenisch

Chou

Ferdman

Wenisch

Diaz

Ebrahimi

Somogyi

Wenisch

Wenisch

Dimitrov

Somogyi

Y

Focus of this talk : Temporal Prefetching

Page 8: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

8

Our Contribution

Significantly advance the state-of-the-art in temporal prefetching

How do we achieve this improvement? Address Correlation [Grunwald 97] PC-localization [Nesbit 04, Somogyi 06]

Previous Best

Our Solution

Speedup 8.3% 23.1%Accuracy 58.6% 93.7%

Page 9: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

9

Address Correlation

A and X are temporally correlated if the occurrence of A is very likely to be followed by X

Temporal streams: Sequences of correlated accesses Highly variable lengths from 2 to several hundreds

[Chilimbi 02, Wenisch 08]

A X B CD E

Page 10: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

10

Address Correlation is Expensive

Large storage requirements

Weaker forms of Correlation: Learn correlated deltasbetween consecutive memory addresses Correlate distance(A,X) with distance(X,B) Sacrifice performance for lower storage requirements

A X B CD E

Page 11: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

11

PC-Localization

PC-localization: Segregate the global stream by the load instruction’s PC

PC-localized streams are more predictable!

while ( ! end ){

read tree->next;if( condition)

read linked_list->next;}

F B a1 A a2 D C E a3 ….

F B A D C E ….

a1 a2 a3 ….

Global Stream

PC-Localized Streams

p

F

B

C

G

A

D E

a1 a2 a3

Page 12: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

12

Design Space for Temporal Prefetching

PC/DC [Nesbit 04]

G/DC[Nesbit 04]

STMS[Wenisch 09]

Weaker Correlation

Address Correlation

PC-Localized

Global

More predictable

Mor

epr

edic

tabl

e

Highest predictability

IrregularStreamBuffer

Page 13: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

13

Significantly advance the state-of-the-art How do we achieve this improvement? First to combine address correlation and PC-localization Two more benefits!

Why is it hard to combine address correlation and PC-localization? Prefetcher ImplementationGlobal History Buffer (GHB)

Our Contribution

Page 14: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

14

History BufferGlobal Access Stream:

Address Correlation with Global History Buffer

History Buffer

A  X  B  C  D  Y  E  …  A  X  B  C  … A

X

B

C

DYE

…AXBC

Page 15: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

15

History Buffer

Address Correlation with Global History Buffer

History Buffer

A

X

B

C

DYE

…AXBC

Input : A

Index Table

A

B

….

Index Table

A

A

A

Predict X

Global Access Stream:A  X  B  C  D  Y  E  …  A  X  B  C  …

Page 16: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

16

History Buffer

Address Correlation with Global History Buffer

History Buffer

Large history buffer• 10-100 MB [Wenisch 09]• Stored off-chip

A

X

B

C

DYE

…AXBC

Page 17: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

17

History Buffer

A

X

….

B

Y

….

C

D

….

Z

E

….

Index Table

PC1

PC2

….Search Index

Table

PC-localization with Global History Buffer

Predict B

Input : PC1, A

PC-localized access StreamsPC1  :  A,  B,  C,  D,  E….PC2 : X, Y, Z

PC1

Search GHB

E

D

C

B

AGlobal Access Stream : A, X, B, Y, C, D, Z, E..

Index Table History Buffer

Page 18: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

18

Limitation of GHB-based Solutions

PC/DC [Nesbit 04]

G/DC[Nesbit 04]

STMS[Wenisch 09]

Weaker Correlation

Address Correlation

PC-Localized

Global

More predictable

Mor

epr

edic

tabl

e

Prohibitively expensive

Page 19: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

19

Limitation of GHB-based Solutions

PC/DC [Nesbit 04]

G/DC[Nesbit 04]

STMS[Wenisch 09]

Weaker Correlation

Address Correlation

PC-Localized

Global

Storage

Com

plex

ity

Off-chip GHB + linked list

traversal

Page 20: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

20

Outline

MotivationGHB-based solutions are limited

Our Solution Replace the GHB with a better organization

Evaluation Future Work Conclusions

Page 21: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

21

Our Solution

X B CA D Y E

A X B CD YE

Indirection

Regular Prefetching

Structural Address Space

Physical Address Space

Page 22: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

22

PC-Localization

Indirection

Physical Address Space

A X B CD YE

Page 23: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

23

PC-Localization

XB CA D YE

A X B CD YE

Indirection

Regular Prefetching

Structural Address Space

Physical Address Space

Page 24: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

24

Irregular Stream Buffer (ISB)

Core

L1

L2

TLB

Irregular Stream Buffer (ISB)

Prefetch Candidates(Physical Addresses)

PC, Trigger Physical Address

Main Memory

(Structural Addresses)

Page 25: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

25

Creating Structural Addresses

Physical Address Space

PhysicalAddress

Structural Address

A

B

C

D

E

F

X

Y

Z

ABCDEF…

XYZ

PC-Localized Stream : A

A 1

Physical to Structural Mapping

Page 26: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

26

Creating Structural Addresses

PhysicalAddress

Structural Address

A 1

B

C

D

E

F

X

Y

Z

Correlated pair : (A, X)

PC-Localized Stream : A X

A 1

X 2

Physical Address Space

ABCDEF…

XYZ

Physical to Structural Mapping

Page 27: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

27

Creating Structural Addresses

Correlated pair : (X, F)

PC-Localized Stream : A X F

PhysicalAddress

Structural Address

A 1

B

C

D

E

F

X 2

Y

Z

F 3

X 2

Physical Address Space

ABCDEF…

XYZ

Physical to Structural Mapping

Page 28: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

28

PhysicalAddress

Structural Address

A 1

B

C

D

E

F 3

X 2

Y

Z

C 4

Creating Structural Addresses

Correlated pair : (F, C)

PC-Localized Stream : A X F C

F 3

Physical Address Space

ABCDEF…

XYZ

Physical to Structural Mapping

Page 29: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

29

PhysicalAddress

Structural Address

A 1

B

C 4

D

E

F 3

X 2

Y

Z

Creating Structural Addresses

Correlated pair : (C, Z)

PC-Localized Stream : A X F C Z

Z 5

C 4

Physical Address Space

ABCDEF…

XYZ

Physical to Structural Mapping

Page 30: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

30

Structural Address

PhyscialAddress

1 A

2 X

3 F

4 C

5 Z

6 E

PhysicalAddress

Structural Address

A 1

B

C 4

D

E

F 3

X 2

Y

Z 5

Prediction in the Structural Address Space

Trigger Address: X

X 2

Physical to Structural Mapping Structural to Physical Mapping

Page 31: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

32

Physical to Structural Address

Mapping Cache

Irregular Stream Buffer (ISB)

Structural to Physical Address Mapping Cache

Page 32: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

33

Physical to Structural Address

Mapping Cache

Irregular Stream Buffer (ISB)

Structural to Physical Address Mapping Cache

Page 33: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

34

Core

L1

L2

Physical to Structural Address

Mapping Cache

Structural to Physical Address Mapping Cache

TLB

Irregular Stream Buffer (ISB)

Trigger Physical Address

PrefetchCandidates

Stream Predictor

Trigger Structural Address

Predicted Structural Address

Stream Predictor: Predicts sequential streams in the structural address space

Page 34: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

35

Core

L1

L2

Physical to Structural Address

Mapping Cache

Structural to Physical Address Mapping Cache

Training Unit

TLB

Irregular Stream Buffer (ISB)

PC, Trigger Physical Address

PrefetchCandidates

Stream Predictor

Trigger Structural Address

Predicted Structural Address

Training Unit: Segregates global stream by PC and assigns structural addresses

Page 35: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

36

Significantly advance the state-of-the-art in temporal prefetching

How do we achieve this improvement? First to combine address correlation and PC-localization Two more benefits!

Revisiting Our Contributions

High coverage

High accuracy

Page 36: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

37

Meta-data Management

But the table is too big to be stored on-chip!

Physical to Structural Address

Mapping Cache

Structural to Physical Address Mapping Cache

Page 37: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

38

New Caching Scheme

We already know how to cache the virtual to physical mappings!

Physical to Structural Address

Mapping Cache

Structural to Physical Address Mapping Cache

Off-chip

TLB miss

Page 38: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

39

New Caching Scheme

Mapping cached on-chip only for TLB resident data Small on-chip budget

Movement of off-chip meta-data synchronized with TLB misses Hides latency of accessing off-chip meta-data hidden Reduces memory traffic

This caching scheme is not possible with the GHB

Page 39: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

41

Significantly advance the state-of-the-art in temporal prefetching

How do we achieve this improvement? First to combine address correlation and PC-localization Enables a novel TLB-synchronized caching scheme

Revisiting Our Contributions

Low traffic overhead

Low on-chip storage

High coverage

High accuracy

Page 40: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

42Main Memory

Irregular Stream Buffer

Core

L1

L2

TLB

Training on the L2 Access Stream

Prefetch Candidates

Trigger Physical AddressA B C D E F

A D F

Page 41: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

43

GHB-based temporal prefetchers

Core

L1

L2

TLB

Train on the L2 Access Stream

Prefetch Candidates

Main Memory

Trigger Physical Address

A D F

A B C D E F

Page 42: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

44

Significantly advance the state-of-the-art in temporal prefetching

How do we achieve this improvement? First to combine address correlation and PC-localization Enables a novel TLB-synchronized caching scheme Allows the ISB to train on the L2 access stream

Revisiting Our Contributions

High coverage

High accuracy

Low traffic overhead

Low on-chip storage

Page 43: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

45

Outline

MotivationGHB-based solutions are limited

Our Solution Replace the GHB with a better organization

Evaluation Future Work Conclusions

Page 44: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

46

Methodology

MARSSx86 simulator Cycle-accurate out-of-order code L1 Data Cache – 64 KB L2 Data Cache – 2 MB DTLB – 128 entries ( TLB miss latency modeled realistically) Main memory 140 cycles ( DRAM queue contention

accurately modeled)

Benchmarks – Irregular subset of Spec2006 20-25 SimPoints per benchmarks Each SimPoint simulates 250M instruction

Page 45: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

47

PC/DC [Nesbit 04]

G/DC[Nesbit 04]

STMS[Wenisch 09]

Comparison with Irregular Prefetchers

Spatial Memory Streaming, Somogyi et al, ISCA 2006 Learns a bitmap of the memory footprint in fixed size

regions

PC/DC [Nesbit 04]

G/DC[Nesbit 04]

ISB

STMS[Wenisch 09]

Weaker Correlation

Address Correlation

PC-Localized

Global

More predictable

Mor

epr

edic

tabl

e

Limit Study : A completely impractical GHB-based combination of PC-localization and address correlation

Page 46: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

48

Comparison with Irregular Prefetchers

Speedup due to irregular prefetchers over a baseline with no prefetching

Page 47: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

49

Comparison with Irregular Prefetchers

Speedup due to irregular prefetchers over a baseline with no prefetching

The combination of PC-localization and

address correlation is superior

27%

Page 48: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

50

Comparison with Irregular Prefetchers

Speedup due to irregular prefetchers over a baseline with no prefetching

ISB approaches the performance of an idealized PC/AC

23.1%

Page 49: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

51

Significantly advance the state-of-the-art in temporal prefetching

How do we achieve this improvement? First to combine address correlation and PC-localization Enables a novel TLB-synchronized caching scheme Train on the L2 access stream

Recall Our Contributions

High coverage

High accuracy

Low traffic overhead

Low on-chip storage

Page 50: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

52

Why does the ISB win?

0

5

10

15

20

25

30

PC/AC + access stream

Spee

dup

(%)

PC-Localization ✔Access Stream ✔

Page 51: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

53

Why does the ISB win?

0

5

10

15

20

25

30

PC/AC + access stream Access stream only

Spee

dup

(%)

Performance loss due to no PC-localization

PC-Localization ✔ ✗Access Stream ✔ ✔

Page 52: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

54

Why does the ISB win?

0

5

10

15

20

25

30

PC/AC + access stream Access stream only PC/AC only

Spee

dup

(%)

Performance loss due to training on the miss stream

PC-Localization ✔ ✗ ✓Access Stream ✔ ✔ ✗

Page 53: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

55

Accuracy comparison

Degree

ISB is over 90% accurate even at higher degrees

Page 54: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

56

Hybrid Prefetchers

40.8%

ISB outperforms other hybrids104%

213%

200%

Page 55: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

57

Hardware Cost

Address Mapping Caches for TLB-resident data 32 KB on-chip storage for TLB with 128 entries 8 KB sufficient in a hybrid setting

Memory traffic Overhead Average meta-data traffic: 8.4% Average traffic due to useless prefetch requests: 6.3%

mcf soplex omnetpp astar libq xalan gcc sphinx

Meta-data Traffic

5.7% 3.8% 5.7% 12% 1.6% 12.6% 11.3% 3.9%

Page 56: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

58

Future Work

Combining spatial and temporal prefetching STeMS - Spatial-temporal Prefetching [Somogyi 2009] Temporal component can be enhanced using the ISB

Evaluation on commercial workloads Bigger memory footprints don’t affect the ISB’s on-chip

storageOptimizations to improve TLB reach, such as

superpages, affect ISB’s on-chip storage

Page 57: Linearizing Irregular Memory Accesses for Improved ... · Address Correlation is Expensive Large storage requirements Weaker forms of Correlation: Learn correlated deltas between

59

Replace the GHB with a new organization

Significantly improves the state-of-the-art First to combine address correlation and PC-localization Enables a novel TLB-synchronized caching scheme Train on the L2 access stream

Conclusions

High coverage

High accuracy

Low traffic overhead

Low on-chip storage

Thank You !