Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
C O M P U T E | S T O R E | A N A L Y Z E
OpenMP and OpenACC a Comparison
James Beyer Cray Inc.
4/15/2014 Cray Property 1
C O M P U T E | S T O R E | A N A L Y Z E
Outline
4/15/2014 2
● Related talks here at GTC ● Background
● OpenMP ● OpenACC
● Important differences (today) ● Parallelism ● Present_or_* ● Scalars ● Loops ● Unstructured data ● Calls (separate compilation units) ● Nested parallelism
● What is next ● OpenMP ● OpenACC
C O M P U T E | S T O R E | A N A L Y Z E
Related talks here at GTC
4/15/2014 3
● 4438 - What's new in OpenACC 2.0 and OpenMP 4.0 ● S4474 - Scaling OpenACC Across Multiple GPUs ● S4472 - Performance Analysis and Optimization of
OpenACC Applications ● S4514 - Panel on Compiler Directives for Accelerated
Computing ● Hangout: OpenACC
● There are more just search for OpenACC
C O M P U T E | S T O R E | A N A L Y Z E
Background -- OpenMP
4/15/2014 4
● FORTRAN version 1.0 - (October 1997) ● Accelerator additions
● Proposal submitted Dec 2009 ● Subcommittee formed Aug 2009
● Cray OpenMP for Accelerators nears release ● Fall 2010 several members for OpenACC working group ● TR1 - Technical Report on Directives for Attached
Accelerators (November 2012) ● OpenMP 4.0 (July 2013)
C O M P U T E | S T O R E | A N A L Y Z E
Background -- OpenACC
4/15/2014 5
● PGI releases accelerator directives ● CAPS releases HMPP ● Fall 2010 several members form OpenACC working group ● OpenACC 1.0 (Nov 2010) ● OpenACC 2.0 (June 2013)
C O M P U T E | S T O R E | A N A L Y Z E
Important differences
4/15/2014 6
● Parallelism ● Present_or_* ● Scalars ● Loops ● Calls (separate compilation units)
C O M P U T E | S T O R E | A N A L Y Z E
Parallelism
4/15/2014 7
● OpenACC ● “Off-load” and parallel startup tied together
● Acc parallel ● Acc kernels
● OpenMP ● “Off-load” and parallel startup disconnected
● Omp target ● Omp parallel ● Omp teams
C O M P U T E | S T O R E | A N A L Y Z E
Parallel startup example (Fortran)
4/15/2014 8
OpenACC !$acc parallel … !$acc end parallel Or !$acc kernels … !$acc end kernels
OpenMP !$omp target !$omp teams/parallel … !$omp end teams !$omp end target
C O M P U T E | S T O R E | A N A L Y Z E
Parallel startup example (C/C++)
4/15/2014 9
OpenACC #pragma acc parallel { … } Or #pragma acc kernels { … }
OpenMP #pragma omp target #pragma omp teams/parallel { … }
C O M P U T E | S T O R E | A N A L Y Z E
OpenMP teams vs parallel
4/15/2014 10
● Why two different “parallel” mechanisms ● Teams
● Independent collision domains ● Same behavior as OpenACC gangs ● Only select directives allowed
● Parallel ● A single collision domain ● Default if neither is present ● All non-accelerator OpenMP directives allowed
C O M P U T E | S T O R E | A N A L Y Z E
Present_or_*
4/15/2014 11
● OpenACC ● present_or_* programmer visible
● Copy, copyin copyout, create ● Copy* without present allowed
● Error prone ● Hard to debug ● Little actual savings
● OpenMP ● present-or_* not programmer visable ● map always implies present test
● In, out, inout, allocate
C O M P U T E | S T O R E | A N A L Y Z E
Scalars
4/15/2014 12
● OpenACC ● Firstprivate by default ● User can override
● Error prone ● Allows implementation to make these kernels arguments ● Pointers are “special”
● OpenMP ● No such restrcition ● Pointers are scalars
C O M P U T E | S T O R E | A N A L Y Z E
Loops
4/15/2014 13
● OpenACC ● One construct “loop” ● Multiple parallelism types ● “nested” parallelism implicit ● Three levels available
● Gang ● Worker ● vector
● OpenMP ● Three constructs
● Distribute ● Do/for ● Simd
● Nested parallelism explicit
C O M P U T E | S T O R E | A N A L Y Z E
Loop examples
4/15/2014 14
OpenACC !$acc loop do i=1,n … enddo
OpenMP !$omp do or !$omp distribute do i=1,n … enddo !$omp end distribute or !$omp end do
C O M P U T E | S T O R E | A N A L Y Z E
Loop examples
4/15/2014 15
OpenACC !$acc loop do i=1,n !$acc loop do j=1,m … enddo enddo
OpenMP !$omp distribute do i=1,n !$omp parallel do do j = 1,m … enddo !$omp end parallel do enddo !$omp end distribute
C O M P U T E | S T O R E | A N A L Y Z E
Loop examples
4/15/2014 16
OpenACC !$acc loop gang worker vector do i=1,n … enddo
OpenMP !$omp distribute parallel do simd do i=1,n … enddo !$omp end distribute parallel do simd
C O M P U T E | S T O R E | A N A L Y Z E
Unstructured data
4/15/2014 17
● Separate the move to and the move from parts of data constructs
● Enter data ● Constructors
● Exit data ● destructors
● OpenACC ● Added support in 2.0
● OpenMP ● Nearing completion of feature
C O M P U T E | S T O R E | A N A L Y Z E
Calls
4/15/2014 18
● OpenACC ● Routine ● Only one type of parallelism
allowed ● Gang ● Worker ● Vector ● Seq
● Hard on user ● Easy for implementer
● OpenMP ● Declare
● Type of parallelism ignored ● Easy on user ● Hard for implementer
C O M P U T E | S T O R E | A N A L Y Z E
Nested parallelism
4/15/2014 19
● OpenACC ● Added in 2.0 ● Currently no full implementations
● Why?
● OpenMP ● Parallel inside of teams is allowed ● Teams inside of teams is not allowed.
C O M P U T E | S T O R E | A N A L Y Z E
What is next
4/15/2014 20
● OpenACC ● Tools interfaces ● Better user defined type support ● …
● OpenMP ● What is next ● Unstructured data ● Declare target deferred_map ● Interoperability with accelerated libraries ● Multiple devices ● User defined type support