24
Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya Unnikrishnan IBM Toronto Lab [email protected] CASCON 2005

Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM Corporation

Compilation Technology

Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers

Priya UnnikrishnanIBM Toronto [email protected] 2005

Page 2: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Overview

Parallelization in IBM XL compilers

Outlining

Automatic parallelization

Cost analysis

Controlled parallelization

Future work

Page 3: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Parallelization

IBM XL compilers support Fortran 77/90/95, C and C++

Implements both OpenMP and Auto-parallelization.

Both target SMP (shared memory parallel) machines

Non-threadsafe code generated by default

– Use the _r invocation (xlf_r, xlc_r … ) to generate threadsafe code

Page 4: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Parallelization options

-qsmp=noopt Parallelizes code with minimal optimization to allow for better debugging of OpenMP applications.

-qsmp=omp Parallelizes code containing OpenMP directives

-qsmp=auto Automatically parallelizes loops

-qsmp=noauto No auto-parallelization. Processes IBM and OpenMP parallel directives.

Page 5: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Outlining

Parallelization transformation

Page 6: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Outlininglong main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then _xlsmpParallelDoSetup_TPO(2208,

&main@OL@1,0,n,5,0,@_xlsmpEntry0,0,0,0,0,0,0)

endif return main;}

int main{}{ #pragma omp parallel for for(int i=0; i<n; i++) { a[i] = const; …… }}

Subroutine void main@OL@1( unsigned @LB, unsigned @UB){ @CIV1 =0; do{ a[]0[(long)@LB + CIV1] = const; …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return;}

+

Runtime call

Outlined routine

Page 7: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

SMP parallel runtime

_xlsmpParallelDoSetup_TPO(&main@OL@1,0,n ..)

main@OL@1(30,39)main@OL@1(0,9) main@OL@1(10,19) main@OL@1(20,29)

The outlined function is parameterized – can be invoked for different ranges in the iteration space

Page 8: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Auto-parallelization

Integrated framework for OpenMP and auto-parallelization

Auto-parallelization is restricted to loops.

Auto-parallelization is done in the link step when possible.

This allows us to perform various interprocedural analysis and optimizations before automatic parallelization

Page 9: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Auto-parallelization transformation

int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… }}

+

int main{}{ #auto-parallel-loop for(int i=0; i<n; i++) { a[i] = const; …… }}

Outlining

Page 10: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

We can auto-parallelize OpenMP applications – skipping user-parallel code – good thing!!

int main{}{ for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; }}

+Outlining

int main{}{ #auto-parallel-loop for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; }}

Page 11: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Pre-parallelization phase

Loop Normalization (normalize countable loops)

Scalar privatization

Array privatization

Reduction variable analysis

Loop interchange (that helps parallelization)

Page 12: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Cost Analysis

Automatic parallelization tests

– Dependence analysis : Is it safe to parallelize ??

– Cost analysis : Is it worthwhile to parallelize ??

Cost analysis: Estimates the total workload of the loop

LoopCost = ( IterationCount * ExecTimeOfLoopBody )

Cost known at compile time – trivial

Runtime cost analysis is more complex

Page 13: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Conditional Parallelization

long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ _xlsmpParallelDoSetup_TPO(2208,

&main@OL@1,0,n,5,0,@_xlsmpEntry0,0,0,0,0,0,0)

} else main@OL@1(0,0,(unsigned)n,0) endif endif return main;}

int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… }}

Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return;}

+

Runtime check

Page 14: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Runtime cost analysis challenges

Runtime checks should be

– Light weight : should not introduce large overhead in applications that are mostly serial

– Overflow problems : leads to incorrect decision – costly!!

loopcost = ((( c1*n1 ) + (c2*n2) + const)*n3)* …

– Restricted to integer operations

– Should be accurate

Balance all the above factors

Page 15: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Runtime dependence test

long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(<deptest> && loop_cost>threshold){ _xlsmpParallelDoSetup_TPO(2208,

&main@OL@1,0,n,5,0,@_xlsmpEntry0,0,0,0,0,0,0)

} else main@OL@1(0,0,(unsigned)n,0) endif endif return main;}

int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… }}

Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return;}

+

Runtime dependence

Work by Peng Zhao

Page 16: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

-20

-10

0

10

20

30

40

50

%Im

pro

ve

me

nt

(-O

5 -

qs

mp

)

swim wupwise mgrid applu lucas mesa art equake ammp apsi facerec fma3d sixtrack

SPEC2000FP auto-par performance1 Proc : -0.5%

2 Proc : 8%

Page 17: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Controlled parallelization

Cost analysis selects big loops

Controlled parallelization

– Selection is not enough

– Parallel performance dependent on

( amount of work + number of processors used)

– Using large number of processors for a small loop huge degradations !!

Page 18: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

0

50

100

150

200

250

Ex

ec

uti

on

tim

e (

se

c)

8 16 32 48 64Processors

galgel (SPECOMPM 2001) performanceMeasured on a 64-way Power5 processor

Small is good !!!

Page 19: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Controlled parallelization

Introduce another runtime parameter IPT (minimum iterations per thread)

The IPT is passed to the SMP runtime

SMP runtime limits the number of threads working on the parallel loop based on IPT

IPT = function( loop_cost, mem access info .. )

Page 20: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Controlled Parallelization

long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ IPT = func(loop_cost) _xlsmpParallelDoSetup_TPO(2208,

&main@OL@1,0,n,5,0,@_xlsmpEntry0,0,0,0,0,0,IPT)

endif } else main@OL@1(0,0,(unsigned)n,0) } return main;}

int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… }} Subroutine void main@OL@1(

…… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return;}

+

Runtime parameter

Page 21: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

SMP parallel runtime

_xlsmpParallelDoSetup_TPO(&main@OL@1,0,n ..IPT)

{

threadsUsed = IterCount/IPT

if (threadsUsed > threadsAvailable)

threadsUsed = threadsAvailable

…..

…..

}

Page 22: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Controlled parallelization for OpenMP

Improves performance and scalability

Allows fine grained control at loop level granularity

Can be applied to OpenMP loops as well

Adjust number of threads when ENV variable OMP_DYNAMIC is turned on.

Issues with threadprivate data

Encouraging results in galgel

Page 23: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

0

50

100

150

200

250

Ex

ec

uti

on

tim

e (

se

c)

8 16 32 48 64Processors

galgel (SPECOMPM 2001) performance

no controlled par controlled par

Measured on a 64-way Power5 processor

Page 24: Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya

Software Group

© 2005 IBM CorporationOctober 2005

Future work

Improve cost analysis algorithm and fine tune heuristics

Implement interprocedural cost analysis.

Extend cost analysis and controlled parallelization to non loops in user-parallel code – for scalability

Implement interprocedural dependence analysis