PGI Visual Fortran Release Notes - pgroup.com · Fortran 2003 semantics in both host and GPU device code generation. This change may affect performance in some cases where a Fortran

PGI VISUAL FORTRAN RELEASE NOTES

Version 2018

PGI Visual Fortran Release Notes Version 2018 | ii

TABLE OF CONTENTS

Chapter 1. PVF Release Overview............................................................................11.1. Product Overview......................................................................................... 11.2. Microsoft Build Tools..................................................................................... 21.3. Terms and Definitions....................................................................................2

Chapter 2. What's New in PGI 2018......................................................................... 32.1. What's New in 18.10..................................................................................... 32.2. What's New in 18.7.......................................................................................42.3. What's New in 18.5.......................................................................................62.4. What's New in 18.4.......................................................................................62.5. What's New in 18.3.......................................................................................62.6. What's New in 18.1.......................................................................................72.7. New and Modified Compiler Options.................................................................102.8. OpenACC.................................................................................................. 10

2.8.1. OpenACC Error Handling.......................................................................... 112.8.2. Performance Impact of Fortran 2003 Allocatables........................................... 14

2.9. CUDA Toolkit Versions.................................................................................. 142.10. Compute Capability....................................................................................172.11. OpenMP.................................................................................................. 172.12. Runtime Library Routines.............................................................................17

Chapter 3. Selecting an Alternate Compiler..............................................................183.1. For a Single Project.....................................................................................183.2. For All Projects.......................................................................................... 18

Chapter 4. Distribution and Deployment.................................................................. 204.1. Application Deployment and Redistributables...................................................... 20

4.1.1. PGI Redistributables............................................................................... 204.1.2. Microsoft Redistributables........................................................................ 20

Chapter 5. Troubleshooting Tips and Known Limitations.............................................. 225.1. PVF IDE Limitations..................................................................................... 225.2. PVF Debugging Limitations............................................................................ 225.3. PGI Compiler Limitations.............................................................................. 235.4. OpenACC Issues.......................................................................................... 23

Chapter 6. Contact Information............................................................................. 24

PGI Visual Fortran Release Notes Version 2018 | iii

LIST OF TABLES

Table 1 Third-Party Software Security Updates for PGI version 18.10 ................................. 10

PGI Visual Fortran Release Notes Version 2018 | iv

PGI Visual Fortran Release Notes Version 2018 | 1

Chapter 1.PVF RELEASE OVERVIEW

PGI command-level compilers for Windows will continue to be enhanced andsupported, but PGI Visual Fortran (PVF) is being deprecated. Existing PVF licenseescan continue to use it indefinitely, but no new licenses will be issued. PVFtechnical support and new features will be discontinued at the end of the 2019calendar year.

Important The PGI 2018 Release includes updated FlexNet license managementsoftware to address a security vulnerability. Users of any previous PGI release mustupdate their FlexNet license daemons to enable PGI 18.1 and subsequent releases.See Third-Party Software Security Updates in What's New in PGI 2018 and our FlexNetUpdate FAQ for more information.

This chapter provides an overview of Release 2018 of PGI Visual Fortran®, a set ofFortran compilers and development tools for Windows integrated with Microsoft®

Visual Studio.

1.1. Product OverviewPVF is integrated with Microsoft Visual Studio 2015. Throughout this document, "PGIVisual Fortran" refers to PVF integrated with VS 2015. Similarly, "Microsoft VisualStudio" refers to Visual Studio 2015. When it is necessary to distinguish further, thedocument does so.

Single-user node-locked and multi-user network floating license options are availablefor both products. When a node-locked license is used, one user at a time can use PVFon the single system where it is installed. When a network floating license is used, asystem is selected as the server and it controls the licensing, and users from any of theclient machines connected to the license server can use PVF. Thus multiple users cansimultaneously use PVF, up to the maximum number of users allowed by the license.

PVF provides a complete Fortran development environment fully integrated withMicrosoft Visual Studio. It includes a custom Fortran Build Engine that automaticallyderives build dependencies, Fortran extensions to the Visual Studio editor, a custom PGI

https://nvd.nist.gov/vuln/detail/CVE-2016-10395

http://www.pgroup.com/support/flex.htm


PVF Release Overview


Debug Engine integrated with the Visual Studio debugger, PGI Fortran compilers, andPVF-specific property pages to control the configuration of all of these.

Release 2018 of PGI Visual Fortran includes the following components:

‣ PGFORTRAN OpenMP and auto-parallelizing Fortran 2003 compiler.‣ PGF77 OpenMP and auto-parallelizing FORTRAN 77 compiler.‣ PVF Visual Studio integration components.‣ OpenACC and CUDA Fortran tools and libraries necessary to build executables for

Accelerator GPUs, when the user’s license supports these optional features.‣ PVF documentation.

1.2. Microsoft Build ToolsPVF on all Windows systems includes Microsoft Open Tools. These files are required inaddition to the files Microsoft provides in the Windows SDK.

1.3. Terms and DefinitionsThis document contains a number of terms and definitions with which you may or maynot be familiar. If you encounter an unfamiliar term in these notes, please refer to thePGI online glossary located at pgicompilers.com/definitions.

These two terms are used throughout the documentation to reflect groups of processors:Intel 64

64-bit Intel x86-64 CPUs including Intel Core processors, Intel Xeon Nehalem, SandyBridge, Ivy Bridge, Haswell, Broadwell and Skylake processors, and Intel Xeon PhiKnights Landing.

AMD6464-bit AMD™ x86-64 CPUs including Bulldozer, Piledriver and EPYC processors.

https://www.pgroup.com/support/definitions.htm

https://www.pgroup.com/definitions


Chapter 2.WHAT'S NEW IN PGI 2018

PGI command-level compilers for Windows will continue to be enhanced andsupported, but PGI Visual Fortran (PVF) is being deprecated. Existing PVF licenseescan continue to use it indefinitely, but no new licenses will be issued. PVFtechnical support and new features will be discontinued at the end of the 2019calendar year.

Important The PGI 2018 Release includes updated FlexNet license managementsoftware to address a security vulnerability. Users of any previous PGI release mustupdate their FlexNet license daemons to enable PGI 18.1 and subsequent releases.See Third-Party Software Security Updates below and our FlexNet Update FAQ formore information.

Welcome to Release 2018 of PGI Visual Fortran!

If you read only one thing about this PGI release, make it this chapter. It covers all thenew, changed, deprecated, or removed features in PGI products released this year. It iswritten with you, the user, in mind.

Every PGI release contains user-requested fixes and updates. We keep a complete list ofthese fixed Technical Problem Reports online for your reference.

2.1. What's New in 18.10

All Compilers

Enhanced output format and readability of the autocompare feature of PGI CompilerAssisted Software Testing (PCAST).

Added support for patch-and-continue to autocompare. When an incorrect value isdetected, autocompare will emit a warning, replace the bad value with the known goodvalue, and continue execution.



https://www.pgroup.com/support/release_tprs.htm

What's New in PGI 2018


Fortran

Added support for Fortran 2018 DO CONCURRENT. The body of the DO CONCURRENTblock is executed serially.

Improved Fortran MATMUL performance for small matrices.

OpenACC and CUDA Fortran

Added support for the CUDA Toolkit version 10.0. Read more about support for CUDAToolkit versions in CUDA Toolkit Versions.

Disabled the nollvm sub-option for -ta=tesla and -Mcuda switches on Windows.The Windows compilers will ignore nollvm and emit a warning to that effect.

OpenMP

Changed the default value of OMP_MAX_ACTIVE_LEVELS to INT_MAX.


All Compilers

The LLVM-based code generator for Linux/x86-64 and Linux/OpenPOWER platformsis now based on LLVM 6.0. On Linux/x86-64 targets where it remains optional (usingthe -Mllvm compiler flag), performance of generated executables using the LLVM-based code generator average 15% faster than the default PGI code generator on severalimportant benchmarks. The LLVM-based code generator will become default on allx86-64 targets in a future PGI release.

The PGI compilers are now interoperable with GNU versions up to and includingGCC 8.1. This includes interoperability between the PGI C++ compiler and g++ 8.1,and the ability for all of the PGI compilers to interoperate with the GCC 8.1 toolchaincomponents, header files and libraries.

Fortran

Implemented Fortran 2008 SUBMODULE support. A submodule is a program unit thatextends a module or another submodule.

The default behavior for assignments to allocatable variables has been changed to matchFortran 2003 semantics in both host and GPU device code generation. This change mayaffect performance in some cases where a Fortran allocatable array assignment is madewithin a kernels directive. For more information and a suggested workaround, refer toPerformance Impact of Fortran 2003 Allocatables. This change can be reverted to thepre-18.7 behavior of Fortran 1995 semantics on a per-compilation basis by adding the-Mallocatable=95 option.



When using the LLVM-based code generator, the Fortran compiler now generates LLVMdebug metadata for module variables, and the quality of debug metadata is generallyimproved.

Free-form source lines can now be up to 1000 characters.

All Fortran CHARACTER entities are now represented with a 64-bit integer length.


The compilers now set the default CUDA version to match the CUDA Driver installedon the system used for compilation. CUDA 9.1 and CUDA 9.2 toolkits are bundled withthe PGI 18.7 release. Older CUDA toolchains including CUDA 8.0 and CUDA 9.0 havebeen removed from the PGI installation packages to minimize their size, but are stillsupported using a new co-installation feature. If you do not specify a cudaX.Y sub-option to -ta=tesla or -Mcuda, you should read more about how this change affects youin CUDA Toolkit Versions.

Changed the default NVIDIA GPU compute capability list. The compilers now constructthe default compute capability list to be the set matching that of the GPUs installed onthe system on which you are compiling. If you do not specify a compute capability sub-option to -ta=tesla or -Mcuda, you should read more about how this change affects youin Compute Capability.

Changed the default size for PGI_ACC_POOL_ALLOC_MINSIZE to 128B from16B. This environment variable is used by the accelerator compilers' CUDAUnified Memory pool allocator. A user can revert to pre-18.7 behavior by settingPGI_ACC_POOL_ALLOC_MINSIZE to 16B.

Added an example of how to use the OpenACC error handling callback routineto intercept errors triggered during execution on a GPU. Refer to OpenACC ErrorHandling for explanation and code samples.

Added support for assert function call within OpenACC accelerator regions on allplatforms.

OpenMP

Improved the efficiency of code generated for OpenMP 4.5 combined "distributeparallel" loop constructs that include a collapse clause, resulting in improvedperformance on a number of applications and benchmarks.

When using the LLVM-based code generator, the Fortran compiler now generatesDWARF debug information for OpenMP thread private variables.



Utilities

Changed the output of the tool pgaccelinfo. The label of the last line of pgaccelinfo'soutput is now "PGI Default Target" whereas prior to the 18.7 release the label "PGICompiler Option" was used.

Changed the output of the tool pgcpuid. The label of the last line of pgcpuid's output isnow "default target" whereas prior to the 18.7 release the label "type" was used.

Deprecations and Eliminations

Dropped support for NVIDIA Fermi GPUs; compilation for compute capability 2.x is nolonger supported.

PGI 18.7 is the last release for Windows that includes bundled Microsoft toolchaincomponents. Future releases will require users to have the Microsoft toolchaincomponents pre-installed on their systems.

2.3. What's New in 18.5The PGI 18.5 release contains all features and fixes found in PGI 18.4 and a few keyupdates for important user-reported problems.

Added full support for CUDA 9.2; use the cuda9.2 sub-option with the -ta=teslaor -Mcuda compiler options to compile and link with the integrated CUDA 9.2 toolkitcomponents.

Added support for Xcode 9.3 on macOS.

Improved function offset information in runtime tracebacks in optimized (non-debug)modes.

2.4. What's New in 18.4The PGI 18.4 release contains all features and fixes found in PGI 18.3 and a few keyupdates for important user-reported problems.

Added support for internal procedures, assigned procedure pointers and internalprocedures passed as actual arguments to procedure dummy arguments.

Added support for intrinsics SCALE, MAXVAL, MINVAL, MAXLOC and MINLOC ininitializers.

2.5. What's New in 18.3The PGI 18.3 release contains all the new features found in PGI 18.1 and a few keyupdates for important user-reported problems.




Key Features

Added support for Intel Skylake and AMD Zen processors, including support for theAVX-512 instruction set on the latest Intel Xeon processors.

Added full support for OpenACC 2.6.

Added support for the CUDA 9.1 toolkit, including on the latest NVIDIA Volta V100GPUs.


Changed the default CUDA Toolkit used by the compilers to CUDA Toolkit 8.0.

Changed the default compute capability chosen by the compilers to cc35,cc50,cc60 .

Added support for CUDA Toolkit 9.1.

Added full support for the OpenACC 2.6 specification including:

‣ serial construct‣ if and if_present clauses on host_data construct‣ no_create clause on the compute and data constructs‣ attach clause on compute, data, and enter data directives‣ detach clause on exit data directives‣ Fortran optional arguments‣ acc_get_property, acc_attach, and acc_detach routines‣ profiler interface

Added support for asterisk ('*') syntax to CUDA Fortran launch configuration. Providingan asterisk as the first execution configuration parameter leaves the compiler free tocalculate the number of thread blocks in the launch configuration.

Added two new CUDA Fortran interfaces,cudaOccupancyMaxActiveBlocksPerMultiprocessor andcudaOccupancyMaxActiveBlocksPerMultprocessorWithFlags. These provide hooksinto the CUDA Runtime for manually obtaining the maximum number of thread blockswhich can be used in grid-synchronous launches, same as provided by the asterisksyntax above.

OpenMP

Changed the default initial value of OMP_MAX_ACTIVE_LEVELS from 1 to 16.

Added support for the taskloop construct's firstprivate and lastprivate clauses.



Added support for the OpenMP Performance Tools (OMPT) interface.

Fortran

Changed how the PGI compiler runtime handles Fortran array descriptor initialization;this change means any program using Fortran 2003 should be recompiled with PGI 18.1.

Libraries

Reorganized the Fortran cuBLAS and cuSolver modules to allow use of the two togetherin any Fortran program unit. As a result of this reorganization, any codes which usecuBLAS or cuSolver modules must be recompiled to be compatible with this release.

Added a new PGI math library, libpgm. Moved math routines from libpgc, libpgftnrtl,and libpgf90rtl to libpgm. This change should be transparent unless you have beenexplicitly adding libpgc, libpgftnrtl, or libpgf90rtl to your link line.

Added new fastmath routines for single precision scalar/vector sin/cos/tan for AVX2 andAVX512F processors.

Added support for C99 scalar complex intrinsic functions.

Added support for vector complex intrinsic functions.

Added environment variables to control runtime behavior of intrinsic functions:MTH_I_ARCH={em64t,sse4,avx,avxfma4,avx2,avx512knl,avx512}

Override the architecture/platform determined at runtime.MTH_I_STATS=1

Provide basic runtime statistics (number of calls, number of elements, percentage oftotal) of elemental functions.

MTH_I_STATS=2Provide detailed call count by element size (single/double-precision scalar, single/double-precision vector size).

MTH_I_FAST={relaxed,precise}Override compile time selection of fast intrinsics (the default) and replace with eitherthe relaxed or precise versions.

MTH_I_RELAXED={fast,precise}Override compile time selection of relaxed intrinsics (the default with-Mfprelaxed=intrinsic) and replace with either the fast or precise versions.

MTH_I_PRECISE={fast,relaxed}Override compile time selection of precise intrinsics (the default with -Kieee) andreplace with either the fast or relaxed versions.

Profiler

Improved the CPU Details View to include the breakdown of time spent per thread.

Added an option to let one select the PC sampling frequency.



Enhanced the NVLink topology to include the NVLink version.

Enhanced profiling data to include correlation ID when exporting in CSV format.

Operating Systems and Processors

Added support for the AMD Zen (EPYC, Ryzen) processor architecture. Use the-tp=zen compiler option to target AMD Zen explicitly.

Added support for the Intel Skylake processor architecture. Use the -tp=skylakecompiler option to target Intel Skylake explicitly.

Added support for the Intel Knights Landing processor architecture. Use the -tp=knlcompiler option to target Intel Knights Landing explicitly.

License Management

Updated FlexNet Publisher license management software to v11.14.1.3. This updateaddresses several issues including:

‣ A security vulnerability on Windows. See Third-Party Software Security Updatesbelow and the FlexNet Update FAQ for more information.

‣ Seat-count stability improvements on network floating license servers whenborrowing licenses (lmborrow) for off-line use. For early return of borrowed seats,users should invoke the new "-bv" option for lmborrow. See our license borrowingFAQ for more information.

Important Users with PGI 2017 (17.x) or older need to update their license daemonsto support 18.1 or newer. The new license daemons are backward-compatible witholder PGI releases.

Deprecations and Eliminations

Dropped support for Microsoft Visual Studio 2013. Installing PVF 18.10 will cause anyversion of PVF integrated with VS 2013 to stop working. PVF continues to support VS2015.

Stopped including components from CUDA Toolkit version 7.5 in the PGI packages.CUDA 7.5 can still be targeted if one directs the compiler to a valid installation locationof CUDA 7.5 using CUDA_HOME.

Deprecated legacy PGI accelerator directives. When the compiler detects a deprecatedPGI accelerator directive, it will print a warning. This warning will include theOpenACC directive corresponding to the deprecated directive if one exists. Warningsabout deprecated directives can be suppressed using the new legacy sub-optionto the -acc compiler option. The following library routines have been deprecated:acc_set_device, acc_get_device, and acc_async_wait; they have been replaced byacc_set_device_type, acc_get_device_type, and acc_wait, respectively. The following

https://www.pgroup.com/support/flex.htm

https://www.pgroup.com/support/faq.htm?tab=lic#borrow

https://www.pgroup.com/support/faq.htm?tab=lic#borrow



environment variables have been deprecated: ACC_NOTIFY and ACC_DEVICE; theyhave been replaced by PGI_ACC_NOTIFY and PGI_ACC_DEVICE_TYPE, respectively.Support for legacy PGI accelerator directives may be removed in a future release.

Dropped support for CUDA x86. The -Mcudax86 compiler option is no longersupported.

Dropped support for CUDA Fortran emulation mode. The -Mcuda=emu compileroption is no longer supported.

Third-Party Software Security Updates

Table 1 Third-Party Software Security Updates for PGI version 18.10

CVE ID Description

CVE-2016-10395 Updated FlexNet Publisher to v11.14.1.3

to address a vulnerability on Windows. We

recommend all users update their license

daemons —see the FlexNet Update FAQ. For more

information, see the Flexera website.

2.7. New and Modified Compiler OptionsRelease 2018 supports new and updated command line options and keywordsuboptions.

Added the following options:

‣ -cpp is now an alias for -Mpreprocess.‣ [dis]allows variadic macros.

Changed the following -Minline sub-options:

‣ Added totalsize:n to limit inlining to the total size of n.‣ maxsize:n replaces size:n to prevent inlining of functions bigger than n. The

compilers silently convert the previous size:n to maxsize:n.‣ Removed levels:n which limited inlining to n levels of functions. The compilers

silently ignore levels:n.

2.8. OpenACCThe following sections contain details about updates to PGI compiler support forOpenACC.



http://www.flexera.com



2.8.1. OpenACC Error HandlingThe OpenACC specification provides a mechanism to allow you to intercept errorstriggered during execution on a GPU and execute a specific routine in response beforethe program exits. For example, if an MPI process fails while allocating memory on theGPU, the application may want to call MPI_Abort to shut down all the other processesbefore the program exits. This section explains how to take advantage of this feature.

To intercept errors the application must give a callback routine to the OpenACCruntime. To provide the callback, the application calls acc_set_error_routine witha pointer to the callback routine.

The interface is the following, where err_msg contains a description of the error:typedef void (*exitroutinetype)(char *err_msg);extern void acc_set_error_routine(exitroutinetype callback_routine);

When the OpenACC runtime detects a runtime error, it will invoke thecallback_routine.

This feature is not the same as error recovery. If the callback routine returns to theapplication, the behavior is decidedly undefined.

Let's look at this feature in more depth using an example.

Take the MPI program below and run it with two processes. Process 0 tries to allocate alarge array on the GPU, then sends a message to the second process to acknowledge thesuccess of the operation. Process 1 waits for the acknowledgment and terminates uponreceiving it.

#include <stdio.h>#include <stdlib.h>#include "mpi.h"

#define N 2147483648

int main(int argc, char **argv){ int rank, size;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);

int ack; if(rank == 0) { float *a = (float*) malloc(sizeof(float) * N);

#pragma acc enter data create(a[0:N])#pragma acc parallel loop independent for(int i = 0; i < N; i++) { a[i] = i *0.5; }#pragma acc exit data copyout(a[0:N]) printf("I am process %d, I have initialized a vector of size %ld bytes on the GPU. Sending acknowledgment to process 1.", rank, N); ack = 1; MPI_Send(&ack, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);



} else if(rank == 1) { MPI_Recv(&ack, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("I am process %d, I have received the acknowledgment from process 0 that data in the GPU has been initialized.\n", rank, N); fflush(stdout); }

// do some more work

MPI_Finalize();

return 0;}

We compile the program with:$ mpicc -ta=tesla -o error_handling_mpi error_handling_mpi.c

If we run this program with two MPI processes, the output will look like the following:

$ mpirun -n 2 ./error_handling_mpiOut of memory allocating -8589934592 bytes of device memorytotal/free CUDA memory: 11995578368/11919294464Present table dump for device[1]:NVIDIA Tesla GPU 0, compute capability 3.7, threadid=1...empty...call to cuMemAlloc returned error 2: Out of memory

-------------------------------------------------------Primary job terminated normally, but 1 process returneda non-zero exit code.. Per user-direction, the job has been aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun detected that one or more processes exited with non-zero status,thus causing the job to be terminated.

Process 0 failed while allocating memory on the GPU and terminated unexpectedly withan error. In this case mpirun was able to identify that one of the processes failed, so itshut down the remaining process and terminated the application. A simple two-processprogram like this is straightforward to debug. In a real world application though, withhundreds or thousands of processes, having a process exit prematurely may causethe application to hang indefinitely. Therefore it would be ideal to catch the failure ofa process, control the termination of the other processes, and provide a useful errormessage.

We can use the OpenACC error handling feature to improve the previous program andcorrectly terminate the application in case of failure of an MPI process.

In the following sample code, we have added an error handling callback routinethat will shut down the other processes if a process encounters an error whileexecuting on the GPU. Process 0 tries to allocate a large array into the GPU and,if the operation is successful, process 0 will send an acknowledgment to process1. Process 0 calls the OpenACC function acc_set_error_routine to set thefunction handle_gpu_errors as an error handling callback routine. This routineprints a message and calls MPI_Abort to shut down all the MPI processes. Ifprocess 0 successfully allocates the array on the GPU, process 1 will receive theacknowledgment. Otherwise, if process 0 fails, it will terminate itself and trigger thecall to handle_gpu_errors. Process 1 is then terminated by the code executed in thecallback routine.



#include <stdio.h>#include <stdlib.h>#include "mpi.h"

#define N 2147483648

typedef void (*exitroutinetype)(char *err_msg);extern void acc_set_error_routine(exitroutinetype callback_routine);

void handle_gpu_errors(char *err_msg) { printf("GPU Error: %s", err_msg); printf("Exiting...\n\n"); MPI_Abort(MPI_COMM_WORLD, 1); exit(-1);}

int main(int argc, char **argv){ int rank, size;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);

int ack; if(rank == 0) { float *a = (float*) malloc(sizeof(float) * N);

acc_set_error_routine(&handle_gpu_errors);

#pragma acc enter data create(a[0:N])#pragma acc parallel loop independent for(int i = 0; i < N; i++) { a[i] = i *0.5; }#pragma acc exit data copyout(a[0:N]) printf("I am process %d, I have initialized a vector of size %ld bytes on the GPU. Sending acknowledgment to process 1.", rank, N); fflush(stdout); ack = 1; MPI_Send(&ack, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); } else if(rank == 1) { MPI_Recv(&ack, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("I am process %d, I have received the acknowledgment from process 0 that data in the GPU has been initialized.\n", rank, N); fflush(stdout); }

// more work

MPI_Finalize();

return 0;}

Again, we compile the program with:$ mpicc -ta=tesla -o error_handling_mpi error_handling_mpi.c

We run the program with two MPI processes and obtain the output below:

$ mpirun -n 2 ./error_handling_mpi



Out of memory allocating -8589934592 bytes of device memorytotal/free CUDA memory: 11995578368/11919294464Present table dump for device[1]:NVIDIA Tesla GPU 0, compute capability 3.7, threadid=1...empty...GPU Error: call to cuMemAlloc returned error 2: Out of memoryExiting...

--------------------------------------------------------------------------MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLDwith errorcode 1.

This time the error on the GPU was intercepted by the application which managedit with the error handling callback routine. In this case the routine printed someinformation about the problem and called MPI_Abort to terminate the remainingprocesses and avoid any unexpected behavior from the application.

2.8.2. Performance Impact of Fortran 2003 AllocatablesIn the PGI 18.7 release, use of Fortran 2003 semantics for assignments to allocatables wasmade the default. This change applies to host and device code alike. Previously, Fortran1995 semantics were followed by default. The change to Fortran 2003 semantics mayaffect performance in some cases where the kernels directive is used.

When the following Fortran allocatable array assignment is compiled using the Fortran2003 specification, the compiler cannot generate parallel code for the array assignment;lack of parallelism in this case may negatively impact performance. real, allocatable, dimension(:) :: a, b allocate(a(100), b(100)) a = 3.14

!$acc kernels a = b !$acc end kernels

The example code can be modified to use an array section assignment instead; thecompiler can parallelize the array section assignment and the lost performance isregained. a(:) = b(:)

2.9. CUDA Toolkit VersionsThe PGI compilers use NVIDIA's CUDA Toolkit when building programs for executionon an NVIDIA GPU. Every PGI installation package puts the required CUDA Toolkitcomponents into a PGI installation directory called 2018/cuda.

An NVIDIA CUDA driver must be installed on a system with a GPU before you can runa program compiled for the GPU on that system. PGI products do not contain CUDADrivers. You must download and install the appropriate CUDA Driver from NVIDIA.The CUDA Driver version must be at least as new as the version of the CUDA Toolkitwith which you compiled your code.

The PGI tool pgaccelinfo prints the driver version as its first line of output. You canuse it to find out which version of the CUDA Driver is installed on your system.

http://www.nvidia.com/cuda



PGI 18.10 includes the following versions of the CUDA Toolkit:

‣ CUDA 9.1‣ CUDA 9.2‣ CUDA 10.0

You can let the compiler pick which version of the CUDA Toolkit to use or you caninstruct it to use a particular version. The rest of this section describes all of youroptions.

If you do not specify a version of the CUDA Toolkit, the compiler uses the version ofthe CUDA Driver installed on the system on which you are compiling to determinewhich CUDA Toolkit to use. This auto-detect feature was introduced in the PGI 18.7release; auto-detect is especially convenient when you are compiling and runningyour application on the same system. Here is how it works. In the absence of any otherinformation, the compiler will look for a CUDA Toolkit version in the PGI 2018/cudadirectory that matches the version of the CUDA Driver installed on the system. If amatch is not found, the compiler searches for the newest CUDA Toolkit version that isnot newer than the CUDA Driver version. If there is no CUDA Driver installed, the PGI18.10 compilers fall back to the default of CUDA 9.1. Let's look at some examples.

If the only PGI compiler you have installed is PGI 18.10, then

‣ If your CUDA Driver is 10.0, the compilers use CUDA Toolkit 10.0.‣ If your CUDA Driver is 9.2, the compilers use CUDA Toolkit 9.2.‣ If your CUDA Driver is 9.1, the compilers use CUDA Toolkit 9.1.‣ If your CUDA Driver is 9.0, the compilers will issue an error that CUDA Toolkit 9.0

was not found; CUDA Toolkit 9.0 is not bundled with PGI 18.10.‣ If you do not have a CUDA driver installed on the compilation system, the

compilers use CUDA Toolkit version 9.1.‣ If your CUDA Driver is newer than CUDA 9.2, the compilers will still use the CUDA

Toolkit 9.2. The compiler selects the newest CUDA Toolkit it finds that is not newerthan the CUDA Driver.

You can change the compiler's default selection for CUDA Toolkit version using one ofthe following methods:

‣ Use a compiler option. Add the cudaX.Y sub-option to -Mcuda or -ta=teslawhere X.Y denotes the CUDA version. For example, to compile a C file with theCUDA 9.2 Toolkit you would use:pgcc -ta=tesla:cuda9.2

Using a compiler option changes the CUDA Toolkit version for one invocation of thecompiler.

‣ Use an rcfile variable. Add a line defining DEFCUDAVERSION to the siterc file inthe installation bin/ directory or to a file named .mypgirc in your home directory.For example, to specify the CUDA 9.2 Toolkit as the default, add the following lineto one of these files:set DEFCUDAVERSION=9.2;

Using an rcfile variable changes the CUDA Toolkit version for all invocations of thecompilers reading the rcfile.



When you specify a CUDA Toolkit version, you can additionally instruct the compilerto use a CUDA Toolkit installation different from the defaults bundled with the currentPGI compilers. While most users do not need to use any other CUDA Toolkit installationthan those provided with PGI, situations do arise where this capability is needed.Developers working with pre-release CUDA software may occasionally need to test witha CUDA Toolkit version not included in a PGI release. Conversely, some developersmight find a need to compile with a CUDA Toolkit older than the oldest CUDA Toolkitinstalled with a PGI release. For these users, PGI compilers can interoperate withcomponents from a CUDA Toolkit installed outside of the PGI installation directories.

PGI tests extensively using the co-installed versions of the CUDA Toolkits and fullysupports their use. Use of CUDA Toolkit components not included with a PGI install isdone with your understanding that functionality differences may exist.

The ability to compile with a CUDA Toolkit other than the versions installed with thePGI compilers is supported on all platforms; on the Windows platform, this feature issupported for CUDA Toolkit versions 9.2 and newer.

To use a CUDA toolkit that is not installed with a PGI release, such as CUDA 8.0 withPGI 18.10, there are three options:

‣ Use the rcfile variable DEFAULT_CUDA_HOME to override the base defaultset DEFAULT_CUDA_HOME = /opt/cuda-8.0;

‣ Set the environment variable CUDA_HOMEexport CUDA_HOME=/opt/cuda-8.0

‣ Use the compiler compilation line assignment CUDA_HOME=pgfortran CUDA_HOME=/opt/cuda-8.0

The PGI compilers use the following order of precedence when determining whichversion of the CUDA Toolkit to use.

1. If you do not tell the compiler which CUDA Toolkit version to use, the compilerpicks the CUDA Toolkit from the PGI installation directory 2018/cuda thatmatches the version of the CUDA Driver installed on your system. If the PGIinstallation directory does not contain a direct match, the newest version in thatdirectory which is not newer than the CUDA driver version is used. If there isno CUDA driver installed on your system, the compiler falls back on an internaldefault; in PGI 18.10, this default is CUDA 9.1.

2. The rcfile variable DEFAULT_CUDA_HOME will override the base default. 3. The environment variable CUDA_HOME will override all of the above defaults. 4. The environment variable PGI_CUDA_HOME overrides all of the above; it is available

for advanced users in case they need to override an already-defined CUDA_HOME. 5. A user-specified cudaX.Y sub-option to -Mcuda and -ta=tesla will override all

of the above defaults and the CUDA Toolkit located in the PGI installation directory2018/cuda will be used.

6. The compiler compilation line assignment CUDA_HOME= will override all of theabove defaults (including the cudaX.Y sub-option).



2.10. Compute CapabilityThe compilers can generate code for NVIDIA GPU compute capabilities 3.0 through 7.0.The compilers construct a default list of compute capabilities that matches the computecapabilities supported by the GPUs found on the system used in compilation. If there areno GPUs detected, the compilers select cc35, cc50, cc60, and cc70.

You can override the default by specifying one or more compute capabilities using eithercommand-line options or an rcfile.

To change the default with a command-line option, provide a comma-separated list ofcompute capabilities to -ta=tesla: for OpenACC or -Mcuda= for CUDA Fortran.

To change the default with an rcfile, set the DEFCOMPUTECAP value to a blank-separated list of compute capabilities in the siterc file located in your installation's bindirectory:set DEFCOMPUTECAP=60 70;

Alternatively, if you don't have permissions to change the siterc file, you can add theDEFCOMPUTECAP definition to a separate .mypgirc file (mypgi_rc on Windows) inyour home directory.

The generation of device code can be time consuming, so you may notice an increase incompile time as the number of compute capabilities increases.

2.11. OpenMP

OpenMP 3.1

The PGI Fortran, C, and C++ compilers support OpenMP 3.1 on all platforms.

2.12. Runtime Library RoutinesPGI 2018 supports runtime library routines associated with the PGI Acceleratorcompilers. For more information, refer to Using an Accelerator in the PGI Compiler User'sGuide.


Chapter 3.SELECTING AN ALTERNATE COMPILER

Each release of PGI Visual Fortran contains two components—the newest release of PVFand the newest release of the PGI compilers and tools that PVF targets.

When PVF is installed onto a system that contains a previous version of PVF, theprevious version of PVF is replaced. The previous version of the PGI compilers andtools, however, remains installed side-by-side with the new version of the PGI compilersand tools. By default, the new version of PVF will use the new version of the compilersand tools. Previous versions of the compilers and tools may be uninstalled using ControlPanel | Add or Remove Programs.

There are two ways to use previous versions of the compilers:

‣ Use a different compiler release for a single project.‣ Use a different compiler release for all projects.

The method to use depends on the situation.

3.1. For a Single ProjectTo use a different compiler release for a single project, you use the compiler flag -V<ver>to target the compiler with version <ver>. This method is the recommended way totarget a different compiler release.

For example, -V13.8 causes the compiler driver to invoke the 13.8 version of the PGIcompilers if these are installed.

To use this option within a PVF project, add it to the Additional options section of theFortran | Command Line and Linker | Command Line property pages.

3.2. For All ProjectsYou can use a different compiler release for all projects.

The Tools | Options dialog within PVF contains entries that can be changed touse a previous version of the PGI compilers. Under Projects and Solutions |

Selecting an Alternate Compiler


PVF Directories, there are entries for Executable Directories, Include and ModuleDirectories, and Library Directories.

‣ For the x64 platform, each of these entries includes a line containing$(PGIToolsDir). To change the compilers used for the x64 platform, change eachof the lines containing $(PGIToolsDir) to contain the path to the desired bin,include, and lib directories.

Warning: The debug engine in PVF 2018 is not compatible with previous releases. Ifyou use Tools | Options to target a release prior to 2018, you cannot use PVFto debug. Instead, use the -V method described earlier in this section to select analternate compiler.


Chapter 4.DISTRIBUTION AND DEPLOYMENT

Once you have successfully built, debugged and tuned your application, you may wantto distribute it to users who need to run it on a variety of systems. This section addresseshow to effectively distribute applications built using PGI compilers and tools.

4.1. Application Deployment and RedistributablesPrograms built with PGI compilers may depend on runtime library files. These libraryfiles must be distributed with such programs to enable them to execute on systemswhere the PGI compilers are not installed. There are PGI redistributable files for Linuxand Windows. On Windows, PGI also supplies Microsoft redistributable files.

4.1.1. PGI RedistributablesPGI Visual Fortran includes redistributable directories which contain all of the PGIdynamically linked libraries that can be re-distributed by PVF 2018 licensees under theterms of the PGI End-User License Agreement (EULA). For reference, a copy of the PGIEULA in PDF form is included in the release.

The following paths for the redistributable directories assume 'C:' is the system drive.

‣ On a 64-bit Windows system, the redistributable directory is:

C:\Program Files\PGI\win64\18.10\REDIST

The redistributable directories contain the PGI runtime library DLLs for all supportedtargets. This enables users of the PGI compilers to create packages of executables andPGI runtime libraries that execute successfully on almost any PGI-supported targetsystem, subject to the requirement that end-users of the executable have properlyinitialized their environment to use the relevant version of the PGI DLLs.

4.1.2. Microsoft RedistributablesPGI Visual Fortran includes Microsoft Open Tools, the essential tools and librariesrequired to compile, link, and execute programs on Windows. PVF 2018 installed forMicrosoft Visual Studio 2015 includes version 14.0 of the Microsoft Open Tools.

Distribution and Deployment


The Microsoft Open Tools directory contains a subdirectory named REDIST. PGI 2018licensees may redistribute the files contained in this directory in accordance with theterms of the associated license agreements.

On Windows, runtime libraries built for debugging (e.g., msvcrtd and libcmtd)are not included with PGI Visual Fortran. When a program is linked with -g fordebugging, the standard non-debug versions of both the PGI runtime libraries and theMicrosoft runtime libraries are always used. This limitation does not affect debuggingof application code.


Chapter 5.TROUBLESHOOTING TIPS AND KNOWNLIMITATIONS

This section contains information about known limitations, documentation errors, andcorrections. Wherever possible, a work-around is provided.

For up-to-date information about the state of the current release, please see the PGIfrequently asked questions (FAQ) webpage.

5.1. PVF IDE LimitationsThe issues in this section are related to IDE limitations.

‣ When moving a project from one drive to another, all .d files for the project shouldbe deleted and the whole project should be rebuilt. When moving a solution fromone system to another, also delete the solution's Visual Studio Solution User Optionsfile (.suo).

‣ The Resources property pages are limited. Use the Resources | Command Lineproperty page to pass arguments to the resource compiler. Resource compiler outputmust be placed in the intermediate directory for build dependency checking to workproperly on resource files.

‣ Dragging and dropping files in the Solution Explorer that are currently open in theEditor may result in a file becoming "orphaned." Close files before attempting todrag-and-drop them.

5.2. PVF Debugging LimitationsThe following limitations apply to PVF debugging:

‣ Debugging of unified binaries is not fully supported. The names of somesubprograms are modified in the creation of the unified binary, and the PVF debugengine does not translate these names back to the names used in the applicationsource code.

https://www.pgroup.com/support/faq.htm

Troubleshooting Tips and Known Limitations


‣ In some situations, using the Watch window may be unreliable for local variables.Calling a function or subroutine from within the scope of the watched local variablemay cause missed events and/or false positive events. Local variables may bewatched reliably if program scope does not leave the scope of the watched variable.

‣ Rolling over Fortran arrays during a debug session is not supported when VisualStudio is in Hex mode. This limitation also affects Watch and Quick Watchwindows.

Workaround: deselect Hex mode when rolling over arrays.

5.3. PGI Compiler Limitations‣ Take extra care when using -Mprof with PVF runtime library DLLs. To build

an executable for profiling, use of the static libraries is recommended. The staticlibraries are used by default in the absence of -Bdynamic.

‣ Using -Mpfi and -mp together is not supported. The -Mpfi flag disables -mpat compile time, which can cause runtime errors in programs that depend oninterpretation of OpenMP directives or pragmas. Programs that do not depend onOpenMP processing for correctness can still use profile feedback. Using the -Mpfoflag does not disable OpenMP processing.

5.4. OpenACC IssuesThis section includes known limitations in PGI's support for OpenACC directives. PGIplans to support these features in a future release.

ACC routine directive limitations

‣ Fortran assumed-shape arguments are not yet supported.

Clause Support Limitations

‣ Not all clauses are supported after the device_type clause.


Chapter 6.CONTACT INFORMATION

You can contact PGI at:

9030 NE Walker Road, Suite 100Hillsboro, OR 97006

Or electronically using any of the following means:

Fax: +1-503-682-2637Sales: [email protected]: https://www.pgroup.com or pgicompilers.com

The PGI User Forum, pgicompilers.com/userforum is monitored by members ofthe PGI engineering and support teams as well as other PGI customers. The forumscontain answers to many commonly asked questions. Log in to the PGI website,pgicompilers.com/login to access the forums.

Many questions and problems can be resolved by following instructions and theinformation available in the PGI frequently asked questions (FAQ), pgicompilers.com/faq.

Submit support requests using the PGI Technical Support Request form,pgicompilers.com/support-request.

mailto:[email protected]

https://www.pgroup.com

https://www.pgroup.com

https://www.pgroup.com/userforum/index.php

https://www.pgroup.com/userforum/index.php

https://www.pgroup.com/support/faq.htm

https://www.pgroup.com/support/support_request.php

Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,"MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES,EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THEMATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OFNONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULARPURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIACorporation assumes no responsibility for the consequences of use of suchinformation or for any infringement of patents or other rights of third partiesthat may result from its use. No license is granted by implication of otherwiseunder any patent rights of NVIDIA Corporation. Specifications mentioned in thispublication are subject to change without notice. This publication supersedes andreplaces all other information previously supplied. NVIDIA Corporation productsare not authorized as critical components in life support devices or systemswithout express written approval of NVIDIA Corporation.

Trademarks

NVIDIA, the NVIDIA logo, Cluster Development Kit, PGC++, PGCC, PGDBG, PGF77,PGF90, PGF95, PGFORTRAN, PGHPF, PGI, PGI Accelerator, PGI CDK, PGI Server,PGI Unified Binary, PGI Visual Fortran, PGI Workstation, PGPROF, PGROUP, PVF,and The Portland Group are trademarks and/or registered trademarks of NVIDIACorporation in the U.S. and other countries. Other company and product namesmay be trademarks of the respective companies with which they are associated.

Copyright

© 2013–2018 NVIDIA Corporation. All rights reserved.

PGI Compilers and Tools

Documents

PGI Visual Fortran Release Notes - pgroup.com · Fortran 2003 semantics in both host and GPU device code generation. This change may affect performance in some cases where a Fortran