27
Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan Chishti, Seth Pugsley *Intel Labs Variable Length Delta Prefetcher 1

Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Embed Size (px)

DESCRIPTION

Spatial Correlation Learn Access (Delta) Patterns Apply patterns when similar conditions re-occur. Eg: PC, physical address, delta patterns Variable Length Delta Prefetcher3 Delta Patterns Regular Delta Patterns. Eg: ( +1, +1, +1)…, (+2, +2, +2, +2)… Irregular Delta Patterns. Eg: ( +1, +2, +3 )…

Citation preview

Page 1: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 1

Efficiently Prefetching Complex Address

PatternsManjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian

University of UtahChris Wilkerson, Zeshan Chishti, Seth Pugsley

*Intel Labs

Page 2: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 2

Prefetchers

Confirmation Based Prefetchers• Issue predictions after a few deltas• High Accuracy• Short Streams Lose out

Immediate Prefetchers• Aggressive• Low Accuracy• Waste DRAM bandwidth and

cache capacity

Accurate Fast

Page 3: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 3

Spatial Correlation• Learn Access (Delta) Patterns• Apply patterns when similar conditions re-occur. • Eg: PC, physical address, delta patterns

Delta Patterns• Regular Delta Patterns. Eg: ( +1, +1, +1)…, (+2, +2, +2, +2)…• Irregular Delta Patterns. Eg: ( +1, +2, +3 )…

Page 4: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 4

Long Repeatable Streams of Irregular Deltas

Page Num: 479218 Deltas: 1, 9, -8, 1, 8, 1, -8, 1, 1, 7……..

Delta patterns for milc

Page 5: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 5

Long Repeatable Streams of Irregular Deltas

Deltas : 1, 9, -8, 1, 8, 1, -8, 1, 1, 7, -1, -5,…..Cache Line: A+1, A+10, A+2, A+3, A+11, A+12, A+4, A+5, A+6, A+13, A+12, A+7……

Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7 Stream2: A+10, A+11, A+12, A+13

Confirmation Prefetches

Stride Prefetcher Coverage: 5/11

SandBox Prefetcher Coverage: 9/11

Neither are perfectly timely!

Page 6: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 6

Variable Length Delta Prefetcher

Page 7: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 7

Core 1

Last

Lev

el $

$

$ Access

$ AccessCore 8

Delta Prediction TablesPer Page

Delta History Tables

Per Page Delta History

Tables

PredictedDelta/OffsetOffset Prediction

Tables

Delta Prediction Tables

Offset Prediction Tables

Structure of VLDP

PredictedDelta/Offset

Page 8: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 8

Delta History Table Tracks delta within a page

for (i=0;i<BIGNUM; i++){

a[i]=b[i]+c[i];}

a, b, c can each belong to different pages So Deltas between pages is meaningless

Delta = Last Address- Current Address

Page 9: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 9

Delta History Table

Page Num.

Last Add.

Last 4 Deltas

Last Predictor

Num. Times Used

Last Four Prefetched Offsets

Page 10: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 10

Delta Prediction Tables

Delta(1) Pred. Accuracy

8 b 8 b 2 b

Deltas (3) Pred. Accuracy

8b 8b 8b 8b 2b

Match?

Predicted Delta

64 Rows per Table

Highest Priority (t=3)Lowest Priority (t=1)

MUX

Match?

Page 11: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 11

Offset Prediction TableFirst Page

OffsetPred.Offset

Accuracy

7 b 7 b 2 b

OPT is used only to predict the second access to a page

Page 12: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 12

Need for Multiple TablesRepeating Delta Pattern- (1, 2, 3, 5, 2, 4)…

Delta Pred.1 22 33 55 2

Delta Pred.1,2 32,3 53,5 25,2 4

Table 1 Table 2

50% Accuracy

Search for Delta pattern match starts from right most table

Page 13: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 13

Looking farther than one Delta aheadRepeating Delta Pattern- (1, 2, 3), (1, 2, 3)…….

Delta Pred.1 22 33 1- -

Delta Pred.1,2 32,3 13,1 2-,- -

Degree 1 Prediction

Current Delta

Page 14: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 14

Looking farther than one Delta aheadRepeating Delta Pattern- 1, 2, 3, 1, 2, 3…….

Delta Pred.1 22 33 1- -

Delta Pred.1,2 32,3 13,1 2-,- -

Degree 1 Prediction

Degree 2 Prediction

Use Recursive lookup to look farther than one Delta

Current Delta Deg 1 Prediction

Page 15: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 15

Case Study: Streaming WorkloadsRepeating Delta Pattern- 1, 1, 1, 1, 1…

Delta Pred.1 1- -- -- -

Delta Pred.-,- --,- --,- --,- -

Table 1 Table 2

Patterns learned from one page is applied to another

Page 16: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 16

Updating the Delta History TablesEvict Not Recently Used

If Page present, add

Delta

If Page not present, replace

Page Num.

Last Add.

Last 4 Deltas

Last Predictor

Num. Used

Last 4 Prefetches

Page Num.

Last Add.

Last 4 Deltas

Last Predictor

Num. Used

Last 4 Prefetches

LLC Access

Page 17: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 17

Updating the Prediction TablesPage Num.

Last Add.

Last 3 Deltas

B, C, D

Delta Pred.B,C,D E?

- -- -- -

Table 3

ELatest Delta If Prediction is Correct

Increment AccuracyIf Prediction of Wrong Decrement Accuracy If Accuracy==0 Update + Promote PredictionIf Prediction is Missing Seed T1 with prediction

Delta Pred.C,D E?

- -- -- -

Delta Pred.D F?- -- -- -

Table 2Table 1

Can the current state predict Latest Delta?

Last Predictor

Page 18: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 18

Populating the Prediction Tables

Delta Pred.1 A- -- -- -

Delta Pred.1,1 B-,- --,- --,- -

Delta Pred.1,1,1 C

- -- -- -

Table 1 Table 2 Table 3Table 1Wrong

Table 2Wrong

NRU NRUNRU

If mis-predict, a longer Delta history might be needed

Pattern Missing

Page 19: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 19

Evaluation Methodology• Simics + USIMM• 8 RISC cores, UltraSPARC III ISA• 3.2 GHz, 4-wide OoO, 128-entry RoB• 32 KB I&D L1 caches, 4 cycles• 8 MB shared (1MB per core) L2 cache, 10 cycles

• DRAM Specifications• 2Channels, 2 Ranks per Channel, 8 Banks per Rank• 800MHz DDR3 DRAM

• SPEC 2006, NPB, and Cloudsuite• Mix1- milc, astar, lbm, libq; Mix2- xalancbmk, lbm, zeusmp,

milc;

Page 20: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 20

VLDP Configuration• Per-Core VLDP• 1 Offset Prediction Table, 64 entry• 3 Delta Prediction Tables, 64 entries each• 16 entry Delta History Table• Only Delta Prediction Tables 2,3 contribute to multi degree prefetch

Offset Prediction Table 128 B

Delta History Table 222 B

Delta Prediction Table 648 B

Total 998 B/Core

Page 21: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 21

Performance Improvement (Vs No PC)

VLDP is 6% better than AMPM 9% better than SBP17% better than FDP

CG IS LU MG SPClassi

fCloud

Astar

Lbm Lib

qMcf

Milc

Omnet

Soplex

XalancZeus

Mix1 Mix2 GM0.81.01.21.41.61.82.0 FDP SBP AMPM VLDP

Spee

dup

Page 22: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 22

Performance Improvement (Vs PC)

VLDP is7.1% better than GHB7.6% better than SMS

CG IS LU MG SPClassi

c

Cloud9Asta

rLb

m Libq

McfMilc

Omnet

Soplex

XalaZeus

Mix1 Mix2 GM0.81.01.21.41.61.82.0 SMS GHB_PC_DC VLDP

Spee

dup

Page 23: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 23

Coverage

FDP 16%SMS 55%SBP 40%

GHB 33%AMPM 49%VLDP 61%

NPB CloudSuite Spec2006 Spec2006-Mix

GM0%20%40%60%80%

100%120%

FDP SMS SBP GHB_PC_DC AMPM VLDP

Cove

rage

Page 24: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 24

Sensitivity to table size

32Page_8T

32Page_16T

32Page_32T

32Page_64T

16Page_8T

16Page_16T

16Page_32T

16Page_64T

8Page_8T

8Page_16T

8Page_32T

8Page_64T0.980.991.001.011.021.03

Spee

dup

2% increase in performance when DPT size is increased

Page 25: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 25

Sensitivity number of Delta Prediction Tables

3DPT improves efficiency despite a modest 1% performance improvement by reducing DRAM requests by 3%

1DPT_NoOPT 1DPT+OPT 2DPT+OPT 3DPT+OPT 4DPT+OPT1

1.1

1.2

1.3

1.4

1.5Speedup DRAM Accesses

Page 26: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 26

Conclusions•OPT Issues predictions without confirmation•DPT recognizes Irregular Delta Patterns• Long delta patterns provide high accuracy• Less than 1KB per core overhead• 6% better performance

Page 27: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan

Variable Length Delta Prefetcher 27

Thank You