31
Code Optimisations and Performance Models for MATLAB Patryk Kiepas 1,2 , Claude Tadonki 1 , Corinne Ancourt 1 Jaros law Ko´ zlak 2 1 MINES ParisTech/PSL University 2 AGH University of Science and Technology, Poland January 30, 2019 1 / 28

Code Optimisations and Performance Models for MATLAB

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Code Optimisations and Performance Models for MATLAB

Code Optimisations andPerformance Models for MATLAB

Patryk Kiepas1,2, Claude Tadonki1, Corinne Ancourt1

Jaros law Kozlak2

1MINES ParisTech/PSL University

2AGH University of Science and Technology, Poland

January 30, 2019

1 / 28

Page 2: Code Optimisations and Performance Models for MATLAB

Outline

Motivation – Why MATLAB?

Three approaches to speedup MATLAB

Code transformationsLoop coalescingLoop interchangeLoop unrollingStrength reduction (power)

Problem with vectorization

MATLAB is JIT compiling

Building an optimisation heuristics

Conclusions

2 / 28

Page 3: Code Optimisations and Performance Models for MATLAB

MATLAB is popular

Figure: TIOBE Index for December 2018. https://www.tiobe.com/tiobe-index/

3 / 28

Page 4: Code Optimisations and Performance Models for MATLAB

Motivation

MATLAB

+ Dynamic language with simple and intuitive syntax

+ Great for fast-prototypingI Built-ins: 2940 (R2018b)I MATALB toolboxes: 66 (e.g. phased array, aerospace)

− Vendor lock-in, closed source

− Lack of formal semantics

− Performance is lagging behind other solutions

4 / 28

Page 5: Code Optimisations and Performance Models for MATLAB

Performance comparison

C Julia LuaJIT Rust Go Fortran Java JavaScriptMatlab Mathematica Python R Octave

iteration_pi_summatrix_multiplymatrix_statisticsparse_integersprint_to_filerecursion_fibonaccirecursion_quicksortuserfunc_mandelbrot

benchmark

100

101

102

103

104

Figure: Julia Micro-Benchmarks. https://julialang.org/benchmarks/

5 / 28

Page 6: Code Optimisations and Performance Models for MATLAB

Three approaches to speedup MATLAB

MATLAB code C, C++, Fortran

MATLAB code(optmized)

Third-partyinterpreter

Translation

New interpretation

Transformation

6 / 28

Page 7: Code Optimisations and Performance Models for MATLAB

Existing solutions

I New interpretationI Scilab1

I Octave2

I MaJIC [Almasi and Padua, 2001]I McVM [Chevalier-boisvert, 2009]

I TranslationI MATALB Coder (C) – official MathWork’s compilerI SILKAN eVariX3 (C)I Menhir (C) [Chauveau and Bodin, 1999]I Mc2For (Fortran) [Chen et al., 2017]I FALCON (Fortran) [DeRose et al., 1995]

I TransformationI Mc2Mc [Chen et al., 2017] – performs vectorization

1https://www.scilab.org/2https://www.gnu.org/software/octave/3http://www.silkan.com/products/evarix/

7 / 28

Page 8: Code Optimisations and Performance Models for MATLAB

Loop coalescing

Before:

for k = 1:N

for l = 1:M

a(l, k) = a(l, k) + c;

end

end

After:

for T = 1:(N .* M)

a(T) = a(T) + c;

end

MATLAB R2018b

MATLAB R2015b

MATLAB R2013a

0 300 600 900

0e+00

1e+09

2e+09

0e+00

2e+07

4e+07

0e+00

2e+06

4e+06

Iterations

Tota

l cyc

les

Version: Original loop Coalesced loop

Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration

Single−thread execution, measured with PAPI 5.6

Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 8 / 28

Page 9: Code Optimisations and Performance Models for MATLAB

Loop interchange

Before:

for k = 1:N

for l = 1:M

total(k) = total(k) + a(k, l);

end

end

After:

for l = 1:M

for k = 1:N

total(k) = total(k) + a(k, l);

end

end

MATLAB R2018b

MATLAB R2015b

MATLAB R2013a

0 300 600 900

0.0e+00

5.0e+07

1.0e+08

1.5e+08

0e+00

2e+07

4e+07

6e+07

8e+07

0.0e+00

5.0e+06

1.0e+07

1.5e+07

Iterations

Tota

l cyc

les

Version: Original loop Interchanged loops

Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration

Single−thread execution, measured with PAPI 5.6

Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 9 / 28

Page 10: Code Optimisations and Performance Models for MATLAB

Loop unrolling

Before:

for k = 2:(N - 1)

a(k) = a(k) + a(k-1) .* a(k+1);

end

After:

for k = 2:2:(N - 2)

a(k) = a(k) + a(k-1) .* a(k+1);

a(k+1) = a(k+1) + a(k) .* a(k+2);

end

if mod((N-2), 2) == 1

a(N-1) = a(N-1) + a(N-2) .* a(N);

end

MATLAB R2018b

MATLAB R2015b

MATLAB R2013a

0 50000 100000 150000 200000

0e+00

1e+07

2e+07

3e+07

0e+00

1e+07

2e+07

0e+00

1e+06

2e+06

3e+06

Iterations

Tota

l cyc

les

Version: Original loop Unrolled loop

Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration

Single−thread execution, measured with PAPI 5.6

Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 10 / 28

Page 11: Code Optimisations and Performance Models for MATLAB

Strength reduction (power)

Before:

for k = 1:N

a(k) = a(k) + c.^k;

end

After:

T = c;

for k = 1:N

a(k) = a(k) + T;

T = T .* c;

end

MATLAB R2018b

MATLAB R2015b

MATLAB R2013a

0 50000 100000 150000 200000

0e+00

2e+07

4e+07

6e+07

0e+00

2e+07

4e+07

0e+00

1e+07

2e+07

3e+07

Iterations

Tota

l cyc

les

Version: Original loop Simplified loop

Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration

Single−thread execution, measured with PAPI 5.6

Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 11 / 28

Page 12: Code Optimisations and Performance Models for MATLAB

Vectorization in MATLAB

% scalar form

for i = 1:N

c(i)=a(i)*b(i)

end

% vector form

c(1:N)=a(1:N).*b(1:N)

% after simplification

c=a.*b

I For many years vectorization was a prevalent optimisation,usually applied systematically

+ Performing more floating-point operations simultaneously

− Sometimes decreases performance in comparison toJIT-compiled loops (Chen et al. 2017 and Kiepas et al. 2018)

12 / 28

Page 13: Code Optimisations and Performance Models for MATLAB

Reproduction of [Chen et al., 2017]I Benchmarks from Ostrich-suite4

I Vectorized with Mc2McI Executed on MATLAB R2015b

Benchmark Dwarf Chen et al. Us

backprop unstructured grid 0.71 0.81bs – 15.0 8.33

capr dense linear algebra 0.79 0.85crni structured grid 0.83 0.81

fft spectral method 0.59 0.64nw dynamic programming 0.96 1.00

pagerank Monte Carlo/MapReduce 0.94 0.94mc Monte Carlo/MapReduce 2.02 2.22

spmv sparse linear algebra 0.013 0.02

Table: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).

4https://github.com/Sable/Ostrich213 / 28

Page 14: Code Optimisations and Performance Models for MATLAB

Is vectorization still relevant?

0 500 1000 1500 2000

0.4

0.6

0.8

1.0

1.2

1.4

Iterations (data size)

Sp

ee

du

p

Loop

crni1

backprop1

Baseline

Figure: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).

14 / 28

Page 15: Code Optimisations and Performance Models for MATLAB

Improving Mc2Mc code generation

Range inlining

% From

k = 1:N;

B = A(k) + 2;

% To

B = A(1:N) + 2;

Range conversion

% From

B = A(2*(1:N) -1);

% To

B = A(1:2:(2*N-1));

Removing explicit index-all

% From

B(:) = A(1:end);

% To

B = A;

15 / 28

Page 16: Code Optimisations and Performance Models for MATLAB

Profitable vectorization point (PV)

Loop Benchmark iterations PV iterations Improved PV iterations

backprop1 {17, 2850001} ∅ ≥ 255backprop2 2 ≥ 4033 ≥ 257backprop3 {17, 2850001} ∅ ≥ 385backprop4 2 ∅ ≥ 257

capr1 8 ≥ 20 ≥ 17capr2 20 ≥ 3329 ≥ 385capr3 49 ≥ 5953 ≥ 321crni1 2300 ≥ 161 ≥ 193crni2 2300 ∅ ≥ 289crni3 2300 ∅ ≥ 1217

fft1 256 ∅ ≥ 417fft2 2, 4, 8 . . . 256 ∅ ≥ 129

nw1 4097 ∅ ≥ 65nw2 4097 ≥ 1665 ≥ 257nw3 4097 ≥ 7681 ≥ 193

pagerank1 1000 ∅ ≥ 273spmv1 {2, 3} ≥ 6337 ≥ 321

Table: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).

16 / 28

Page 17: Code Optimisations and Performance Models for MATLAB

Profile-guided vectorization

backprop crni fft nw pagerank

Strategy

Systematic

Selective (optimized)

Benchmark

Speedup

0.0

0.5

1.0

1.5

2.0

Baseline

Figure: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).

17 / 28

Page 18: Code Optimisations and Performance Models for MATLAB

A bit of history of MATLAB

I Starts as an interpreter (1984)

I Introduces JIT along the interpreter around 6.5 (2002)

I Combines JIT with the interpreter in R2015b

I Introduces PGO (profile-driven optimisations) around R2018b

18 / 28

Page 19: Code Optimisations and Performance Models for MATLAB

Warmup phaseWarmup is an observable effect of some JIT policy performingcompilation on a code. Policy is a set of rules if, when and how tocompile the code [Kulkarni 2011].

[Kulkarni 2011]: Kulkarni, P. A. (2011). JIT compilation policy for modern machines. ACM SIGPLAN Notices,46(10), 773.

19 / 28

Page 20: Code Optimisations and Performance Models for MATLAB

Warmup phase patterns

0 50 100 150 200 250 300

0.32

0.36

0.40

backprop, R2018b, process #1 (warmup)

in−process iteration

time

[s]

0 50 100 150 200 250 300

0.12

0.14

0.16

0.18

nqueens, R2015b, process #8 (warmup)

in−process iteration

time

[s]

0 50 100 150 200 250 300

0.33

50.

340

0.34

50.

350

bubble, R2013a, process #1 (slowdown)

in−process iteration

time

[s]

The patterns come in different flavours [Barrett et al. 2017]:

I Warmup

I Slowdown

I Flat

I Inconsistent[Barrett et al. 2017]: Barrett, E., Bolz-Tereick, C. F., Killick, R., Mount, S., & Tratt, L. (2017). Virtual machinewarmup blows hot and cold. Proceedings of the ACM on Programming Languages, vol. 1 (Issue OOPSLA), 1–27.

20 / 28

Page 21: Code Optimisations and Performance Models for MATLAB

About our heuristics

Our heuristics is a binary choice (optimise – positive / do nothing– negative) that takes into consideration the code, trip countand/or the machine’s properties.

Designing goal

Prefer being conservative (false negatives FN are OK) thanoptimising wrongly (false positives FP > 0).

precision =TP

TP + FP→ 1 (1)

However, too much FN means we are optimising only a little!

21 / 28

Page 22: Code Optimisations and Performance Models for MATLAB

1. Handcrafted optimisation heuristics

We pose a question: What does vectorization change?

Store instructions (PAPI_SR_INS) Cycles with no instruction finished (PAPI_STL_CCY)

Conditional branches (PAPI_BR_CN) Load instructions (PAPI_LD_INS)

0 100 200 300 400 0 100 200 300 400

1

2

3

4

5

1

2

3

4

5

Iterations

Rat

io o

f cha

nge

afte

r ve

ctor

izat

ion

TSVC/s1115/MATLAB R2013a

Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults from 30 measurements with warmup phase consideration

Single−thread execution, measured with PAPI 5.6

22 / 28

Page 23: Code Optimisations and Performance Models for MATLAB

Precision

60.0%

70.0%

80.0%

90.0%

100.0%

0.0 0.5 1.0 1.5 2.0

Threshold

Pre

cisi

on

Ratio of change

Loads

Stores

Branches

Stalls

Precison of handcrafted heuristics; TSVC Benchmark Suite; R2013a

23 / 28

Page 24: Code Optimisations and Performance Models for MATLAB

2. Automatic dynamic model

Followed by the work of [Cavazos et al., 2007] – we have build amodel using machine learning and dynamic set of features(performance counters).

Methodology

1. Collecting performance counters (TSVC Benchmark Suite)

2. Normalising (by PAPI TOT INS, hybrid)

3. Oversampling for dealing with class imbalance

4. Training on TSVC, testing on LCPC16 [Chen et al., 2017]

5. Only out-of-the-box components, no fine-tuning(meta-learning, hyper parameter optimisations)

[Cavazos et al. 2007]: Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M. F. P., & Temam, O. (2007).Rapidly Selecting Good Compiler Optimizations using Performance Counters. CGO’07 (pp. 185–197).

.

24 / 28

Page 25: Code Optimisations and Performance Models for MATLAB

Evaluation

Test Metrics AdaBoost Decision Tree (CART)

TSVC (Cross-validation5)Precision (%) 96.63 % 97.02 %Accuracy (%) 94.38 % 93.95 %

LCPC16 Test setPrecision (%) 99.51 % 99.36 %Accuracy (%) 92.85 % 72.26 %

510-folds25 / 28

Page 26: Code Optimisations and Performance Models for MATLAB

Decision tree

N ≤ 361.0entropy = 0.647samples = 1652

value = [273, 1379]class = VECTORIZE

N ≤ 63.5entropy = 0.965samples = 394

value = [240, 154]class = NOTHING

True

PAPI_L2_ICM ≤ 0.008entropy = 0.175samples = 1258

value = [33, 1225]class = VECTORIZE

False

PAPI_BR_MSP ≤ 0.002entropy = 0.32samples = 155value = [146, 9]

class = NOTHING

PAPI_STL_ICY ≤ 0.163entropy = 0.967samples = 239

value = [94, 145]class = VECTORIZE

PAPI_STL_CCY ≤ 0.069entropy = 0.177samples = 150value = [146, 4]

class = NOTHING

entropy = 0.0samples = 5value = [0, 5]

class = VECTORIZE

entropy = 0.0samples = 3value = [0, 3]

class = VECTORIZE

PAPI_BR_CN ≤ 0.202entropy = 0.059samples = 147value = [146, 1]

class = NOTHING

entropy = 0.0samples = 146value = [146, 0]

class = NOTHING

entropy = 0.0samples = 1value = [0, 1]

class = VECTORIZE

PAPI_STL_ICY ≤ 0.002entropy = 0.889samples = 209

value = [64, 145]class = VECTORIZE

entropy = 0.0samples = 30value = [30, 0]

class = NOTHING

FP_ARITH:SCALAR_DOUBLE ≤ 0.012entropy = 0.449samples = 64value = [6, 58]

class = VECTORIZE

PAPI_L1_STM ≤ 0.0entropy = 0.971samples = 145value = [58, 87]

class = VECTORIZE

PAPI_SR_INS ≤ 0.148entropy = 0.211samples = 60value = [2, 58]

class = VECTORIZE

entropy = 0.0samples = 4value = [4, 0]

class = NOTHING

entropy = 0.0samples = 2value = [2, 0]

class = NOTHING

entropy = 0.0samples = 58value = [0, 58]

class = VECTORIZE

PAPI_PRF_DM ≤ 0.001entropy = 0.559samples = 23value = [20, 3]

class = NOTHING

PAPI_BR_CN ≤ 0.123entropy = 0.895samples = 122value = [38, 84]

class = VECTORIZE

entropy = 0.0samples = 20value = [20, 0]

class = NOTHING

entropy = 0.0samples = 3value = [0, 3]

class = VECTORIZE

PAPI_TLB_DM ≤ 0.001entropy = 0.573samples = 59value = [8, 51]

class = VECTORIZE

PAPI_L1_STM ≤ 0.0entropy = 0.998samples = 63

value = [30, 33]class = VECTORIZE

N ≤ 134.0entropy = 0.485samples = 57value = [6, 51]

class = VECTORIZE

entropy = 0.0samples = 2value = [2, 0]

class = NOTHING

PAPI_STL_CCY ≤ 0.072entropy = 1.0samples = 6value = [3, 3]

class = NOTHING

PAPI_FUL_ICY ≤ 0.173entropy = 0.323samples = 51value = [3, 48]

class = VECTORIZE

entropy = 0.0samples = 3value = [0, 3]

class = VECTORIZE

entropy = 0.0samples = 3value = [3, 0]

class = NOTHING

PAPI_BR_UCN ≤ 0.07entropy = 0.575samples = 22value = [3, 19]

class = VECTORIZE

entropy = 0.0samples = 29value = [0, 29]

class = VECTORIZE

PAPI_BR_CN ≤ 0.123entropy = 0.881samples = 10value = [3, 7]

class = VECTORIZE

entropy = 0.0samples = 12value = [0, 12]

class = VECTORIZE

PAPI_L2_LDM ≤ 0.002entropy = 0.811

samples = 4value = [3, 1]

class = NOTHING

entropy = 0.0samples = 6value = [0, 6]

class = VECTORIZE

entropy = 0.0samples = 3value = [3, 0]

class = NOTHING

entropy = 0.0samples = 1value = [0, 1]

class = VECTORIZE

PAPI_PRF_DM ≤ 0.001entropy = 0.764samples = 36value = [8, 28]

class = VECTORIZE

PAPI_STL_CCY ≤ 0.204entropy = 0.691samples = 27value = [22, 5]

class = NOTHING

PAPI_BR_CN ≤ 0.135entropy = 0.985samples = 14value = [8, 6]

class = NOTHING

entropy = 0.0samples = 22value = [0, 22]

class = VECTORIZE

entropy = 0.0samples = 6value = [0, 6]

class = VECTORIZE

entropy = 0.0samples = 8value = [8, 0]

class = NOTHING

PAPI_STL_CCY ≤ 0.077entropy = 0.961samples = 13value = [8, 5]

class = NOTHING

entropy = 0.0samples = 14value = [14, 0]

class = NOTHING

entropy = 0.0samples = 6value = [6, 0]

class = NOTHING

PAPI_MEM_WCY ≤ 0.0entropy = 0.863

samples = 7value = [2, 5]

class = VECTORIZE

entropy = 0.0samples = 2value = [2, 0]

class = NOTHING

entropy = 0.0samples = 5value = [0, 5]

class = VECTORIZE

N ≤ 921.0entropy = 0.058samples = 1190value = [8, 1182]

class = VECTORIZE

PAPI_TLB_DM ≤ 0.0entropy = 0.949samples = 68

value = [25, 43]class = VECTORIZE

PAPI_BR_CN ≤ 0.124entropy = 0.305samples = 147value = [8, 139]

class = VECTORIZE

entropy = 0.0samples = 1043value = [0, 1043]

class = VECTORIZE

entropy = 0.0samples = 85value = [0, 85]

class = VECTORIZE

PAPI_BR_CN ≤ 0.126entropy = 0.555samples = 62value = [8, 54]

class = VECTORIZE

entropy = 0.0samples = 5value = [5, 0]

class = NOTHING

FP_ARITH:SCALAR_DOUBLE ≤ 0.011entropy = 0.297samples = 57value = [3, 54]

class = VECTORIZE

entropy = 0.0samples = 41value = [0, 41]

class = VECTORIZE

PAPI_FUL_ICY ≤ 0.171entropy = 0.696samples = 16value = [3, 13]

class = VECTORIZE

entropy = 0.0samples = 8value = [0, 8]

class = VECTORIZE

PAPI_LD_INS ≤ 0.302entropy = 0.954

samples = 8value = [3, 5]

class = VECTORIZE

PAPI_PRF_DM ≤ 0.004entropy = 0.65samples = 6value = [1, 5]

class = VECTORIZE

entropy = 0.0samples = 2value = [2, 0]

class = NOTHING

entropy = 0.0samples = 5value = [0, 5]

class = VECTORIZE

entropy = 0.0samples = 1value = [1, 0]

class = NOTHING

PAPI_L2_DCM ≤ 0.04entropy = 0.276samples = 21value = [1, 20]

class = VECTORIZE

PAPI_LD_INS ≤ 0.308entropy = 1.0samples = 47

value = [24, 23]class = NOTHING

entropy = 0.0samples = 20value = [0, 20]

class = VECTORIZE

entropy = 0.0samples = 1value = [1, 0]

class = NOTHING

PAPI_BR_CN ≤ 0.19entropy = 0.907samples = 31

value = [21, 10]class = NOTHING

PAPI_STL_ICY ≤ 0.15entropy = 0.696samples = 16value = [3, 13]

class = VECTORIZE

PAPI_BR_MSP ≤ 0.0entropy = 0.811samples = 28value = [21, 7]

class = NOTHING

entropy = 0.0samples = 3value = [0, 3]

class = VECTORIZE

entropy = 0.0samples = 12value = [12, 0]

class = NOTHING

N ≤ 872.0entropy = 0.989samples = 16value = [9, 7]

class = NOTHING

FP_ARITH:SCALAR_DOUBLE ≤ 0.042entropy = 0.811samples = 12value = [9, 3]

class = NOTHING

entropy = 0.0samples = 4value = [0, 4]

class = VECTORIZE

PAPI_BR_MSP ≤ 0.001entropy = 1.0samples = 6value = [3, 3]

class = NOTHING

entropy = 0.0samples = 6value = [6, 0]

class = NOTHING

entropy = 0.0samples = 2value = [0, 2]

class = VECTORIZE

PAPI_RES_STL ≤ 9.646entropy = 0.811

samples = 4value = [3, 1]

class = NOTHING

entropy = 0.0samples = 3value = [3, 0]

class = NOTHING

entropy = 0.0samples = 1value = [0, 1]

class = VECTORIZE

entropy = 0.0samples = 12value = [0, 12]

class = VECTORIZE

PAPI_RES_STL ≤ 0.512entropy = 0.811

samples = 4value = [3, 1]

class = NOTHING

entropy = 0.0samples = 3value = [3, 0]

class = NOTHING

entropy = 0.0samples = 1value = [0, 1]

class = VECTORIZE

26 / 28

Page 27: Code Optimisations and Performance Models for MATLAB

3. Automatic static model

Image: Cummins, C., Petoumenos, P., Wang, Z., and Leather, H.(2017). End-to-End Deep Learning of Optimization Heuristics. In2017 26th IEEE International Conference on Parallel Architectures andCompilation Techniques (PACT). [Cummins et al., 2017]

I Sequences of codes arethe input

I Auxiliary inputs: numberof iterations

I No dynamic features

I In order to force learningfrom sequences – shortensequences (less padding)

I Small precision – moredata? Around 1652 datapoints, but only 118 codesequences.

27 / 28

Page 28: Code Optimisations and Performance Models for MATLAB

Conclusions

I Working optimisation heuristics without opening theMATLAB’s black-box (which might be infeasible)

I Deeper understanding of how to measure MATLAB’sperformance

I Perspective: fine-tuning of models and extending evaluationfor other machines and versions of MATLAB

Thank you!

28 / 28

Page 29: Code Optimisations and Performance Models for MATLAB

Almasi, G. and Padua, D. (2001).MaJIC: A Matlab just-in-time Compiler.In Lecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), volume 2017, pages 68–81.

Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle,M. F., and Temam, O. (2007).Rapidly Selecting Good Compiler Optimizations usingPerformance Counters.In International Symposium on Code Generation andOptimization (CGO’07), pages 185–197. IEEE.

Chauveau, S. and Bodin, F. (1999).Menhir: An Environment for High Performance Matlab.Scientific Programming, 7(3-4):303–312.

Chen, H., Krolik, A., Lavoie, E., and Hendren, L. (2017).Automatic Vectorization for MATLAB.

28 / 28

Page 30: Code Optimisations and Performance Models for MATLAB

In Ding, C., Criswell, J., and Wu, P., editors, Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformatics),volume 10136 LNCS of Lecture Notes in Computer Science,pages 171–187. Springer International Publishing, Cham.

Chevalier-boisvert, M. (2009).MCVM : An Optimizing Virtual Machine for The MATLABProgramming Language.

Cummins, C., Petoumenos, P., Wang, Z., and Leather, H.(2017).End-to-End Deep Learning of Optimization Heuristics.In 2017 26th International Conference on ParallelArchitectures and Compilation Techniques (PACT), volume2017-Septe, pages 219–232. IEEE.

DeRose, L., Gallivan, K., Gallopoulos, E., Marsolf, B. A., andPadua, D. (1995).FALCON: An Environment for the Development of ScientificLibraries and Applications.

28 / 28

Page 31: Code Optimisations and Performance Models for MATLAB

Proc. First International Workshop on Knowledge-BasedSystem for the (re)Use of Program Libraries, (November).

28 / 28