23
Rached Abdelkhalak, NVIDIA CUDA DEVELOPER TOOLS: OVERVIEW & NEW FEATURES

CUDA DEVELOPER TOOLS: OVERVIEW & NEW FEATURES...Graphics Profiling CUDA Kernel Profiling Gfx GPU crash dump CUDA GPU crash dump 3 Nsight Eclipse Edition Nsight Visual Studio Edition

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • Rached Abdelkhalak, NVIDIA

    CUDA DEVELOPER TOOLS: OVERVIEW & NEW FEATURES

  • 2

    DEVELOPER WORKFLOW

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

  • 3

    Nsight Eclipse EditionNsight Visual Studio Edition

    DEVELOPER TOOLS PORTFOLIO

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

  • 4

    Nsight Eclipse EditionNsight Visual Studio Edition

    DEVELOPER TOOLS PORTFOLIO

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

    Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

  • 5

    Nsight Eclipse EditionNsight Visual Studio Edition

    DEVELOPER TOOLS PORTFOLIO

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

    Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

    cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

  • 6

    Nsight Eclipse EditionNsight Visual Studio Edition

    DEVELOPER TOOLS PORTFOLIO

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

    Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

    cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

    Nsight Systems

  • 7

    Nsight Eclipse EditionNsight Visual Studio Edition

    DEVELOPER TOOLS PORTFOLIO

    Application

    Development

    IDE integration

    Debug Gfx APIDebug CUDA

    System & CUDA

    Trace

    Graphics ProfilingCUDA Kernel

    Profiling

    Gfx GPU

    crash dump

    CUDA GPU

    crash dump

    Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

    cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

    Nsight Systems

    Nsight Compute

    Nsight Graphics

  • 8

    OVERALL UPDATESChips, platforms support across developer tools

    CUDA 11.0 support OS support updatesMacOSX host platform only

    Removal of Windows 7 support

    Chips UpdateA100 GPU Support

    Arm SBSA support

  • 9

    NVIDIA® NSIGHT™ ECLIPSE EDITION

    Plug-in to EclipseEclipse 4.8 to 4.11 support

    Edit, build, debug CUDA-C applications

    CUDA aware source code editor – syntax highlighting, code completion and inline help

    Debugger - Seamless and simultaneous debugging of CPU and GPU code

    NVCC build integration to cross compile for various target platforms

    Docker support

    Plug-ins enabling CUDA development

    Documentation: https://docs.nvidia.com/cuda/nsight-eclipse-plugins-guide

    https://docs.nvidia.com/cuda/nsight-eclipse-plugins-guide

  • 10

    NSIGHT VISUAL STUDIO EDITION

    Visual Studio 2015, 2017 and 2019 support

    CPU + GPU CUDA Debugging

    Source-correlated assembly debugging(SASS / PTX / SASS+PTX)

    Data breakpoints for CUDA C/C++ code

    Expressions in Locals, Watch and Conditionals

    CUDA info view

    Crash dump

    Warp Watch view in NextGen debugger

    Improved OptiX debugging (-lineinfo)

    More info & documentation: https://developer.nvidia.com/nsight-visual-studio-edition

    VS Extension enabling CUDA development

    https://developer.nvidia.com/nsight-visual-studio-edition

  • 11

    NSIGHT TOOLS INTEGRATION INTO VISUAL STUDIO

    Improved workflow for Nsight Systems, Nsight Graphics & Nsight Compute:• Settings passed to the standalone tool• Quick launch with key bindings• Complements Nsight Visual Studio Edition's Debugger

    Works with Visual C++, C#, Visual Basic .NET, F#, and Python projectsOn Visual Studio Marketplace: installation and automatic update notificationsSupported in Visual Studio 2015, 2017, & 2019

    Improved Workflow. Launches more powerful, standalone tools within Visual Studio

    Tools appear under the Nsight menu (highlighted) Visual Studio project settings are transferred to the Nsight tool upon launch

    More Info: https://developer.nvidia.com/nsight-tools-visual-studio-integration

    https://developer.nvidia.com/nsight-tools-visual-studio-integration

  • 12

    CUDA-GDB

    Command line source and assembly (SASS) level debugger

    Nsight Eclipse Edition debugging backend

    Simultaneous CPU and GPU debugging

    Inspect and modify memory, register, variable state

    Control program execution

    Runtime GPU error detection

    Support for multiple GPUs, multiple contexts, multiple kernels, Thread focus

    Core dump support

    Documentation : http://docs.nvidia.com/cuda/cuda-gdb

    Overview of command line debugger

    (cuda-gdb) info cuda threads breakpoint all

    BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line

    Kernel 0

    (1,0,0) (0,0,0) 0x0000000000948e58 0 11 0 0 infoCommands.cu 12

    (1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12

    (1,0,0) (2,0,0) 0x0000000000948e58 0 11 0 2 infoCommands.cu 12

    (1,0,0) (3,0,0) 0x0000000000948e58 0 11 0 3 infoCommands.cu 12

    (1,0,0) (4,0,0) 0x0000000000948e58 0 11 0 4 infoCommands.cu 12

    (1,0,0) (5,0,0) 0x0000000000948e58 0 11 0 5 infoCommands.cu 12

    (cuda-gdb) info cuda threads breakpoint 2 lane 1

    BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line

    Kernel 0

    (1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12

    http://docs.nvidia.com/cuda/cuda-gdb

  • 13

    CUDA-GDB

    Upgrade to GDB 8.2

    MacOSX support brought back to life as a host

    Performance improvements

    ● Module load time (~30% faster)

    Quality improvements

    ● Improved handling of -lineinfo debug information (OptiX)● Improved debugging with parallel cuda-gdb sessions● Enabled CPU-side hardware watchpoints

    New Features

  • 14

    CUDA-MEMCHECK

    Multiple tools

    memcheck : reports out of bounds/misaligned memory access errors

    racecheck : identifies races on __shared__ memory

    initcheck : usage of uninitialized global memory

    synccheck : identify invalid usage of __syncthreads() and __syncwarp()

    Documentation: http://docs.nvidia.com/cuda/cuda-memcheck/index.html

    Functional correctness checking tool suite

    http://docs.nvidia.com/cuda/cuda-memcheck/index.html

  • 15

    COMPUTE-SANITIZER

    Next-Gen replacement tool for cuda-memcheck

    New command line interface (CLI) tool based on the Sanitizer API

    Performance gain for applications using libraries such as CUSOLVER, CUFFT or DL frameworks

    cuda-memcheck has performance issues on Windows that were inherent in its design

    compute-sanitizer fixes that and brings performance on Windows to be on par with Linux.

    Available in CUDA 11.0

    OS: Linux (x86_64, Power, Arm SBSA), Windows

    GPUs: Maxwell+

    cuda-memcheck still supported in CUDA 11.0 (does not support Arm SBSA)

    Documentation: https://docs.nvidia.com/cuda/compute-sanitizer

    New tool for functional correctness checking tool suite

  • 16

    COMPUTE-SANITIZER

    Provides finer control than cuda-memcheck through APIs to analyze memory patterns

    APIs are grouped into two categories:

    Callback API – CUDA events such as memory allocations/kernel

    Patching API – inserts patches for specific memory instructions

    New in CUDA 11.0:

    • SanitizerSetCallbackData API has been updated to take a function as input rather than a stream

    • Added support for Cooperative Groups

    • Memory Access callbacks for shared/local memory now report the address offset within the shared/local window

    Documentation: https://docs.nvidia.com/cuda/compute-sanitizer

    Samples: https://github.com/NVIDIA/compute-sanitizer-samples

    API for functional correctness checking tool suite

    https://docs.nvidia.com/cuda/compute-sanitizer/index.htmlhttps://github.com/NVIDIA/compute-sanitizer-samples

  • 17

    NSIGHT SUITE OF PROFILERS

  • 18

    NSIGHT SYSTEMSMPI & OpenACC Trace

  • 19

    NSIGHT SYSTEMSOpenMP 5 API Trace

  • 20

    NSIGHT COMPUTE

    Support for CUDA Asynchronous Copy

    Sparse Data compression

    A100 GPU support

  • 21

    NSIGHT COMPUTE

    Efficient way to evaluate kernel characteristics, quickly understand potential directions for further improvements or existing limiters

    Inputs: Arithmetic Intensity (FLOPS/bytes)Performance (FLOPS/s)

    Ceilings: Peak Memory BandwidthPeak FP32/FP64 Performance

    New Roofline analysis

  • 22

    USEFUL LINKS

    Web: https://developer.nvidia.com/tools-overview

    How to contact us? Forums: https://forums.developer.nvidia.com/c/development-toolsemail: [email protected]

    Other digital GTC talks of interest:

    S21351: Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes

    S21547: Rebalancing the Load:Profile-Guided Optimization of the NAMD Molecular Dynamics Program for Modern GPUs using Nsight Systems

    S21771: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes using Nsight Compute

    S21565: Roofline Performance Model for HPC and Deep-Learning Applications

    https://developer.nvidia.com/tools-overviewhttps://forums.developer.nvidia.com/c/development-toolsmailto:[email protected]://developer.nvidia.com/gtc/2020/video/s21351https://developer.nvidia.com/gtc/2020/video/s21547https://developer.nvidia.com/gtc/2020/video/s21565