37
Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong

Static Identification of Delinquent Loads

  • Upload
    vic

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Static Identification of Delinquent Loads. V.M. Panait Sasturkar W.-F. Fong. Agenda. Introduction Related Work Delinquent Loads Framework Address Patterns, Decision Criteria The heuristic: types of classes, computing the weights, final classes Results. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Static Identification of Delinquent Loads

Static Identification of Delinquent Loads

V.M. PanaitA. SasturkarW.-F. Fong

Page 2: Static Identification of Delinquent Loads

AgendaIntroductionRelated WorkDelinquent LoadsFramework

Address Patterns, Decision Criteria

The heuristic: types of classes, computing the weights, final classesResults

Page 3: Static Identification of Delinquent Loads

IntroductionCache – one of the major current bottlenecks in performanceOne approach: prefetch; but prefetch what ? Can’t prefetch everything…Few loads are really “bad” – “delinquent loads”This paper: classification of address patterns in the load instructions

Page 4: Static Identification of Delinquent Loads

IntroductionDone after code generation, but before runtimeSingled out 10% of all loads causing over 90% of the misses in 18 SPEC benchmarksGets even better combined with basic block profiling: 1.3% loads covering over 80% of the misses

Page 5: Static Identification of Delinquent Loads

Related WorkBDH method: classify loads based on following criteria:

Region of memory accessed by the load: S (stack), H (heap) or G (global).Kind of reference: loading a scalar (S), element of array (A) or field of a structure (S)Type of reference: (P)ointer or (N)ot.

Page 6: Static Identification of Delinquent Loads

Related WorkSome classes account for most misses: GAN, HSN, HFN, HAN, HFP, HAP.The OKN method: 3 simple heuristics

Use of a pointer dereferenceUse of a strided referenceNone of the above

This paper is much more precise than both above methods

Page 7: Static Identification of Delinquent Loads

Delinquent LoadsWhy not stores too ? Write buffers are apparently good enoughWhy not do it in hardware ? They do, but:

Need additional specialized hardwareComplex decisions (fast) <-> complex hardware

Memory profiling: not always practical

Page 8: Static Identification of Delinquent Loads

Delinquent Loads & Profiling

Page 9: Static Identification of Delinquent Loads

FrameworkAssembly code -> address patterns for each load instruction -> placement of the load instruction in a classClasses + weights -> heuristic functionIf the value of the heuristic is greater than a delinquency threshold, the instruction is classified as possibly delinquent

Page 10: Static Identification of Delinquent Loads

Address PatternsAddress Pattern = summary of how the source address of the load instruction is computedUses CFG and DF analysis (reaching definitions) (one address pattern for each control path reaching the load)Only uses basic registers (BR): gp, sp, regparam, regret

Page 11: Static Identification of Delinquent Loads

The Decision CriteriaClasses are derived from these criteriaH1: Register usage in an address pattern (usage of BR’s)H2: Type of operations used in address computation (arithmetic, logic)H3: Maximum level of dereferencing

Page 12: Static Identification of Delinquent Loads

The Decision CriteriaH4: Recurrence (iterative walk through memory)H5: Execution frequency – based on BB profiling; classifies loads as:

Rarely executed (used here as negative)Seldom executed (idem)Fairly often executed (not used here)In a program hotspot

Page 13: Static Identification of Delinquent Loads

Decision Criteria and Classes

Each criterion results in a set of classesClass = set of address patterns with a certain propertyThere are too many classes that can result; only some are considered, and some of those are also aggregated into one class

Page 14: Static Identification of Delinquent Loads

Decision Criteria and Classes

H1 – based classes: enumerations of the number of occurrences of each of the 4 BR’s in an address patternH2 – based classes: address patterns with multiplications and shift operationsH3 – based classes: as many as there are levels of dereferencing in the address patterns

Page 15: Static Identification of Delinquent Loads

Decision Criteria and Classes

H4 – based classes: two classes (address pattern involves recurrence or not)H5 – based classes: three classes: rarely, seldom and program hotspot

Page 16: Static Identification of Delinquent Loads

Experimental SetupSimpleScalar toolkit: cache simulator (for cache hits & misses), compiler, objdumpProcedure: Fortran -> C code (via f2c) -> MIPS executable (via C2MIPS compiler) -> disassembled code (via objdump)Reconstruction of CFG and DF analysis

Page 17: Static Identification of Delinquent Loads

Experimental Setup2 stages: learning/training and experimental (actual)Stage 1: get full memory profiling data on a subset of SPEC benchmarks, use it to compute weights for each classUse the heuristic thus obtained on a new subset of benchmarks

Page 18: Static Identification of Delinquent Loads

The Heuristic: Types of Classes

Three types of classes:Positive (loads in it are likely delinquent)Negative (… not …)Neutral

Positive classes have positive weights, negative ones have negative weights, neutral classes have a weight of zero

Page 19: Static Identification of Delinquent Loads

The miss probability of class F in benchmark j:

The amount of misses accounted for by members of class F in benchmark j:

The Heuristic: Terminology

Fi

j iE

CFMCFm

)(

),(),(

)),((

),(),(

CIPM

CFMCFn j

Page 20: Static Identification of Delinquent Loads

The Heuristic: Terminologymj(F,C) = likelihood of an instruction of class F in benchmark j to be a cache miss

However, if that instruction is only executed once, it won’t be a delinquent load

nj(F,C) = proportion out of total number of misses that members of F account for

Page 21: Static Identification of Delinquent Loads

The Heuristic: TerminologyStrength index: r = mj / nj

A benchmark j is irrelevant to a class F if both indices mj and nj are below certain thresholds. Otherwise it is relevant.Positive class: r > 5% for all benchs.Negative class: nj < 0.5% for all benchs.

Neutral class: r < 5% for 1+ benchs.

Page 22: Static Identification of Delinquent Loads

Computing the WeightsForm classes according to the five decision criteriaCompute mj, nj for each class

Weight of class Fk

kFk Rj kj

kj

Fk CFn

CFm

RFW

),(

),(

||

1)(

Page 23: Static Identification of Delinquent Loads

Computing the WeightsThis is the formula for positive classes onlyOnly relevant benchmarks are included in the formula|.| is the cardinality of that set, i.e. the number of benchmarks relevant to that class

Page 24: Static Identification of Delinquent Loads

Aggregate ClassesAG1: both gp and sp are used 1+ each (comes from H1)AG2: only sp used 2+ (H1)AG3: either * or shifts are used (H2)AG4: one level dereferencing (H3)AG5: two level dereferencing (H3)AG6: three level dereferencing (H3)

Page 25: Static Identification of Delinquent Loads

Aggregate ClassesAG7: address patterns containing a recurrence (H4)AG8: loads with low frequency of execution (100 < f < 1000) (H5)AG9: loads with fairly low frequency of execution (f < 100 times) (H5)Weight formula for negative classes: negated mean of positive weights

Page 26: Static Identification of Delinquent Loads

The Heuristic Function

1 if 0 otherwise

the load is delinquent

9

1

),()(max)(AG

AGk

kjdkWi

),( kjdkj

)(i

Page 27: Static Identification of Delinquent Loads

Precision and CoveragePrecision of a heuristic scheme H, (H): the (correct) number of loads that scheme H identifies as delinquent (the lower, i.e., closer to the real one, the better)Coverage of a heuristic scheme H, (H): the number of cache misses caused by loads identified as delinquent by scheme H (the closer to 100%, the better)

Page 28: Static Identification of Delinquent Loads

Results on different inputs

Page 29: Static Identification of Delinquent Loads

Results when varying cache associativity

Page 30: Static Identification of Delinquent Loads

Results when varying cache size

Page 31: Static Identification of Delinquent Loads

Performance on new benchmarks

Page 32: Static Identification of Delinquent Loads

Performance summary

Page 33: Static Identification of Delinquent Loads

Performance of OKN & BDH

Page 34: Static Identification of Delinquent Loads

Performance with various

Page 35: Static Identification of Delinquent Loads

Combination with BB profiling

Use the heuristic to sharpen the set returned by BB profilingAlso add loads that are not in the hotspots is the percentage of the highest scoring loads detected by our method but not by profiling that we consider to be delinquent

Page 36: Static Identification of Delinquent Loads

Combination with BB profiling

Page 37: Static Identification of Delinquent Loads

ConclusionsThe static scheme for identifying delinquent loads has a precision of 10% and coverage of over 90% over 18 benchmarksMore precise than related work, similar coverageImmune to variation of framework parameters (e.g. cache size, assoc., input)