9
Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C exhibits 10% of the theoretical peak performance poorly balanced, solves problem 2 times faster than code A or B

Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Embed Size (px)

Citation preview

Page 1: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Code optimization

What do we want to optimize?

Code A exhibits 90% of

theoretical peak performance (FLOPS)

Code B exhibits 90% of strong scaling

Code C exhibits 10% of the

theoretical peak performance poorly balanced,

solves problem 2 times faster than code A or B

Page 2: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Code optimization

What do we want to optimize?

Code A uses slow algorithm

has great weak and strong scaling

Code B uses fast algorithm

but poor scaling

Page 3: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

How to profile a code?

1. Make profiling the code as simple as you can.

2. Always expect surprises ... do not assume anything regarding your code performance.

3. Look for bottle-necks and for the most time consuming parts of the code.

4. Keep the reference version of the code.

5. Document every modification you make !!!!

Page 4: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Code profiling: exampleint main(int argc, char **argv){

MPI_Init(argc,argv);

sync(); t0=time();functionA();sync(); t1=time();

sync(); t2=time();functionB();sync(); t3=time();

MPI_Finalize();}

# cores time t1-t0 t3-t2

16 Min 1 15

Max 10 16

32 Min 0.8 10

Max 5.4 11

64 Min 0.6 7

Max 3.2 8.5

Which function should we optimize first?

Page 5: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Code optimization through code profiling

Difference: 40 sec 25 sec

Page 6: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Performance analysis toolspgprof vampire

crayPAT

100.0% | 100.0% | 512 | Total ------------------------------------ | 59.8% | 59.8% | 306 | stepfx_ | 17.6% | 77.3% | 90 | getrusage | 8.0% | 85.4% | 41 | stepfy_ | 6.2% | 91.6% | 32 | integr_ | 2.0% | 93.6% | 10 | gradco_ | 1.0% | 94.5% | 5 | __write | 0.8% | 95.3% | 4 | filerx_ |

IMP

Page 7: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

CrayPAT

http://www.nersc.gov/nusers/systems/franklin/tools.phphttp://docs.cray.com/cgi-bin/craydoc.cgi?this_sort=title;mode=Search;sq=%20product%3D%22CrayPat%22

%module load xt-craypat

%man pat_build

%man pat_report

EXAMPLE:

pat_build -f -g blas,io,blacs -D trace-max=1600 -u a.out

Page 8: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Debugging parallel code

DDT

Totalview

Page 9: Code optimization What do we want to optimize? Code A exhibits 90% of theoretical peak performance (FLOPS) Code B exhibits 90% of strong scaling Code C

Totalview

• http://www.totalviewtech.com/index.html• https://computing.llnl.gov/tutorials/totalview• https://computing.llnl.gov/tutorials/totalview/

exercise.html