55

Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Pruning Strategies

in Adaptive O�-line Auto-tuning

Lu Li, Usman Dastgeer, Christoph Kessler

Linköping University

1 / 40

Page 2: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Agenda

1 Motivation

2 Approach

3 Results

4 Conclusion

2 / 40

Page 3: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Problem Background

Application needs tuned for optimal performance

Performance tuning is challenging

Heterogeneous processorsCon�guration diversity

Auto-tuning needed

Performance Portability

Approach: Implementation selection for a multi-variant

component

3 / 40

Page 4: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Component Interface and Implementations

Implementation variantPlatforms (CPU, acceleratorcores)

Algorithms

Tunable parameter settings

Compiler transformations

...

Meta-dataDependencies

Resource requirements

Deployment descriptors

Performance prediction models

...

4 / 40

Page 5: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Staged Composition

5 / 40

Page 6: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Guiding Composition: Empirical Models

Analytical model

Empirical model

No or little understanding of the target architectureO�-line Empirical Models: Feed sampling data (trainingexamples) into a prediction model, e.g. SVM, C4.5

6 / 40

Page 7: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

O�-line Empirical Models: Pros and Cons

+ Avoid �cold start�

+ Controllable tuning process

- Tuning overhead

�Cold start� E�ect

Example from Dastgeer et al. (ParCo'2011) on SkePU/StarPU integration,

Coulombic potential grid execution, with 3 successive executions

7 / 40

Page 8: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Zoom into the Cons

A closer look at o�-line overhead

Exhaustive execution is not feasiblePrevious work: Stargazer (random sampling)

Training examples (Sampling data) is vital for performance

prediction models

8 / 40

Page 9: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Attack the Cons: Observations

In the context of performance tuning: a concrete example

(Matrix-matrix multiplication)

Smart sampling by heuristics

9 / 40

Page 10: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 11: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 12: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 13: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 14: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 15: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 16: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 17: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 18: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 19: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling

Sample only vertices of a space

Recursive decomposition (cutting) for open spaces, controlled

by maximum depth, etc

10 / 40

Page 20: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Formalization: Runtime context (e.g. Input Size)

Context Property Value Space (PVS) is a n-dimensional�nite space

A run-time context instance consists of n context propertyvalues maps to a point in the PVSPVS has 2n vertices (corner points)

Closed and open subspaces

Closed subspace: a subspace where a variant wins on allvertices. It heuristically approximate �uninteresting� subspacesOpen subspace: otherwise

Recursive space decomposition

11 / 40

Page 21: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Convexity Assumption from the Observation

�In real life, the world is smoothly changing, instances close by

most of the time have the same labels, and we need not worry

about all possible labelings.�

Ethem Alpaydin in �Introduction to Machine Learning, second edition�

Our Convexity Assumption

If all vertices of a subspace share the same winner, then thewinner wins in all points of the subspace statistically

12 / 40

Page 22: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling In the Big Picture

O�-line: adaptive sampling, tree construction

On-line (Run-time)

Load the treePrediction

Closed space: look up winner on one vertexOpen space: Euclidean-based predictor

13 / 40

Page 23: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling In the Big Picture

O�-line: adaptive sampling, tree construction

On-line (Run-time)Load the treePrediction

Closed space: look up winner on one vertexOpen space: Euclidean-based predictor

13 / 40

Page 24: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Techniques for Adaptive Sampling

Light oversampling

Sample one extra point in the middle

Detect holes

Small increase in overhead

May increase prediction accuracy

14 / 40

Page 25: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Thresholding

Relative threshold: abs(vi − vmin)/vmin <= θStop splitting early

Reduce training overhead

May decrease prediction accuracy

15 / 40

Page 26: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Thresholding

Relative threshold: abs(vi − vmin)/vmin <= θStop splitting early

Reduce training overhead

May decrease prediction accuracy

15 / 40

Page 27: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Implementation Pruning

Only winners of the vertices will involve in the future sampling

of the subspace

Reduce overhead remarkably if many implementation variants

are available for an interface

May lead to loss of optimization potential

16 / 40

Page 28: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Implementation Pruning

Only winners of the vertices will involve in the future sampling

of the subspace

Reduce overhead remarkably if many implementation variants

are available for an interface

May lead to loss of optimization potential

Try 10 variantsfor each vertex

16 / 40

Page 29: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Implementation Pruning

Only winners of the vertices will involve in the future sampling

of the subspace

Reduce overhead remarkably if many implementation variants

are available for an interface

May lead to loss of optimization potential

16 / 40

Page 30: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Implementation Pruning

Only winners of the vertices will involve in the future sampling

of the subspace

Reduce overhead remarkably if many implementation variants

are available for an interface

May lead to loss of optimization potential

16 / 40

Page 31: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Further Techniques Based on Adaptive Sampling

Implementation Pruning

Only winners of the vertices will involve in the future sampling

of the subspace

Reduce overhead remarkably if many implementation variants

are available for an interface

May lead to loss of optimization potential

Try 2 winner variantsfor each untested vertex

16 / 40

Page 32: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Platform

Name CPU GPU OS CompilerCora 16 Intel(R) Xeon(R)

CPU X5550 @2.67GHz

3 GPUs: two nVidiaTesla C2050 andone Tesla C1060

RHEL5.6

gcc 4.1.2and nvccV0.2.1221

Fermi 8 Intel(R) Xeon(R)CPU E5520 @2.27GHz

two Tesla M2050GPUs

3.2.1-2-ARCH

gcc 4.6.2and nvccV0.2.1221

17 / 40

Page 33: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Benchmarks

Benchmark Feature modeling Range Implementation variantsMatrix-matrixmultipli-cation(MM)

row size, columnsize of �rst ma-trix, column sizeof second matrix

(30, 30, 30) to(300, 300, 300)

Sequential implementation anda variant by loop rearrange-ment, CUDA impl., BLAS impl.,Pthread impl. and �ve of its vari-ants from loop rearrangement

Sorting(ST)

array size; discre-tization of arrayvalues distribu-tion (samplednumber of inver-sions)

(1,0) to(10k,10)

bubble sort, insertion sort,merge sort, quick sort

Path �nder(PF)

row; column (1,1) to(10k,20k)

OpenMP implementation,CUDA implementation

Backpropagation(BP)

array size (1k) to (100k) OpenMP implementation,CUDA implementation

18 / 40

Page 34: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Prediction Accuracy (%) on Cora: Base-line adaptive o�-line sampling

19 / 40

Page 35: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Prediction Accuracy (%) on Fermi: Base-line adaptive o�-line sampling

20 / 40

Page 36: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Other Metrics

Absolute runtime overhead: 4 - 23 µs

Relative runtime overhead: 0.2%

Average sampling rate: 0.053%

21 / 40

Page 37: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Test against Convexity Assumption

BP MM

PF ST

Accuracy(%) on closed space. Pecentage(%) on closed space. 22 / 40

Page 38: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Test against Convexity Assumption

BP MM

PF ST

Accuracy(%) on closed space. Pecentage(%) on closed space. 23 / 40

Page 39: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Test against Convexity Assumption

BP

Accuracy(%) on closed space. Pecentage(%) on closed space.

24 / 40

Page 40: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Thresholding E�ect: Training Time

BP: Depth 4 MM: Depth 4

PF: Depth 1 ST: Depth 4 25 / 40

Page 41: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Thresholding E�ect: Prediction Accuracy

BP: Depth 4 MM: Depth 4

PF: Depth 1 ST: Depth 426 / 40

Page 42: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Thresholding E�ect: Prediction Accuracy

BP: Depth 4

MM: Depth 4

PF: Depth 1 ST: Depth 4 27 / 40

Page 43: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Thresholding E�ect: Prediction Accuracy

BP: Depth 4

28 / 40

Page 44: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Oversampling E�ect: Prediction Accuracy

BP MM

PF ST

Adaptive Sampling. Adaptive Sampling with light oversampling.

29 / 40

Page 45: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Oversampling E�ect: Prediction Accuracy

BP MM

PF ST

Adaptive Sampling. Adaptive Sampling with light oversampling.

30 / 40

Page 46: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Oversampling E�ect: Prediction Accuracy

BP

Adaptive Sampling. Adaptive Sampling with light oversampling.

31 / 40

Page 47: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Oversampling E�ect: Training Time

BP MM

PF ST

Adaptive Sampling. Adaptive Sampling with light oversampling.

32 / 40

Page 48: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Implementation Pruning E�ect: Training Time

BP MM

PF ST

Adaptive Sampling. Adaptive Sampling with impl pruning.

33 / 40

Page 49: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Implementation Pruning E�ect: Prediction Accuracy

BP MM

PF ST

Adaptive Sampling. Adaptive Sampling with impl pruning.

34 / 40

Page 50: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Combo E�ect: Training Time

BP MM

PF STAdaptive Sampling. Adaptive Sampling with the combo of oversampling

and impl pruning.

35 / 40

Page 51: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Combo E�ect: Prediction Accuracy

BP MM

PF STAdaptive Sampling. Adaptive Sampling with combo of oversampling

and impl pruning.36 / 40

Page 52: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling vs Random Sampling

BP MM

PF STRandom Sampling Adaptive Sampling Adaptive Sampling with the combo

of oversampling and impl pruning.

37 / 40

Page 53: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling vs Random Sampling

BP MM

PF STRandom Sampling Adaptive Sampling Adaptive Sampling with the combo

of oversampling and impl pruning.38 / 40

Page 54: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Adaptive Sampling vs Random Sampling

MMRandom Sampling Adaptive Sampling Adaptive Sampling with the combo

of oversampling and impl pruning.

39 / 40

Page 55: Pruning Strategies in Adaptive Off-line Auto-tuningLu Li , Usman Dastgeer, Christoph Kessler Linköping University 1/40. Agenda 1 Motivation 2 Approach 3 Results 4 Conclusion 2/40

Conclusion

Convexity Assumption holds for all benchmarks used, more to

test

Three techniques help to decrease training time or increaseprediction accuracy

ThresholdingLight OversamplingImplementation pruning

The right combination can combine the advantages

Adaptive sampling shows bene�ts against random sampling

More questions please contact me: [email protected], +46704759692

40 / 40