GPU Architecture and Programming. GPU vs CPU

GPU Architecture and Programming

GPU vs CPUhttps://www.youtube.com/watch?v=fKK933KK6Gg

GPU Architecture

• GPU (Graphics Processing Unit) were originally designed as graphics accelerators, used for real-time graphics rendering.

• Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIA's first GPU in 1999.

• CPU + GPU is a powerful combination – CPUs consist of a few cores optimized for serial processing, – GPUs consist of thousands of smaller, more efficient cores

designed for parallel performance. – Serial portions of the code run on the CPU while parallel

portions run on the GPU

Architecture of GPU

Image copied from http://www.pgroup.com/lit/articles/insider/v2n1a5.htm Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

CUDA Programming

• CUDA (Compute Unified Device Architecture) is a parallel programming platform created by NVIDIA based on its GPUs.

• By using CUDA, you can write programs that directly access GPU.

• CUDA platform is accessible to programmers via CUDA libraries and extensions to programming languages like C, C++ AND Fortran. – C/C++ programmers use “CUDA C/C++”, compiled with nvcc

compiler– Fortran programmers can use CUDA Fortran, compiled with PGI

CUDA Fortran

• Terminology:– Host: The CPU and its memory (host memory)– Device: The GPU and its memory (device memory)

Programming Paradigm

Copy from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Parallel function of application: execute as a kernel

Programming Flow

1. Copy input data from CPU memory to GPU memory

2. Load GPU program and execute3. Copy results from GPU memory to CPU

memory

• Each parallel function of application is execute as a kernel

• That means GPUs are programmed as a sequence of kernels; typically, each kernel completes execution before the next kernel begins.

• Fermi has some support for multiple, independent kernels to execute simultaneously, but most kernels are large enough to fill the entire machine.

Image copied from http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Hello World! Example

_ _global_ _ is a CUDA C/C++ keyword meaning • mykernel() will be exectued on the device• mykernel() will be called from the host

Addition Example

• Since add runs on device, pointers a, b, and c must point to device memory

Vector Addition Example

Kernel Function:

Alternative 1:

Alternative 2:

int globalThreadId = threadIdx.x + blockIdx.x * M //M is the number of threads in a block

Int globalThreadId = threadIdx.x + blockIdx.x * blockDim.x

• So the kernel becomes

• The main becomes

Handling Arbitrary Vector Sizes

GPU Architecture and Programming. GPU vs CPU

Documents

Integer Programming Based Heterogeneous CPU-GPU Cluster Schedulers …ozturan/preprint-soner-ozturan-jcss.pdf · Integer Programming Based Heterogeneous CPU-GPU Cluster Schedulers

A Discussion of CPU vs. GPU

cpu and GPU

GPU Programming Guide G80 - NVIDIA · Improving GPU performance simply increases GPU idle time. Another easy way to find out if your application is CPU-limited is to ignore all draw

FShark - DIKUhjemmesider.diku.dk/~zgh600/Publications/mikkel-msc-thesis.pdf · GPU programs, compared to normal (sequential) CPU programming, severely hinders the adoption of GPU

C Language Extensions for Hybrid CPU/GPU Programming with

GPU-based MRC Methods for Overlapping eBeam Shots › docs › bacus-2013-gpu-kato.pdf · GFLOP comparison（CPU vs GPU） Architecture of CPU and GPU CPU has a single cache memory

Heterogeneous Programming and Optimization of Gyrokinetic ...phoenix.ps.uci.edu/zlin/bib/zhangwl18p.pdf928 nodes of Summit. The GPU + CPU speed up from purely CPU is over 20 times,

Unicorn: A Bulk Synchronous Programming Model, …sbansal/theses/tarun_beri.pdfunicorn: a bulk synchronous programming model, framework and runtime for hybrid cpu-gpu clusters tarun

GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail

REDEFINING THE ROLE OF THE CPU IN ERA OF CPU ...cseweb.ucsd.edu/~marora/files/papers/cpu-gpu-ieeemicro...REDEFINING THE ROLE OF THE CPU IN THE ERA OF CPU-GPU INTEGRATION

MULTI GPU PROGRAMMING WITH MPI...MULTI GPU PROGRAMMING WITH MPI MPI+CUDA PCI-e GPU GDDR5 Memory Memory System Memory CPU Network Card Node 0 PCI-e GPU GDDR5Memory Memory System CPU

Central Processing Unit/Graphics Processing Unit (CPU/GPU ... · Central Processing Unit/Graphics Processing Unit (CPU/GPU) Hybrid Computing of ... Being a combined CPU/GPU solution,

GPU Programming with CUDA - SC-Camp · Introduction from games to science 4 5 CUDA 3 GPU Programming 2 Architecture Final Remarks 1. CPU vs GPU. CPU vs GPU - A few general purpose

CPU, GPU und FPGA - eti/Vorlesung/WS1718/Informationsmaterial/... · Maximilian Bandle CPU, GPU und FPGA CPU • Bisher in Vorlesung betrachtet • Über Assembler/Maschinensprache

Programming Heterogeneous Many-Cores Using Directives - Part 1developer.download.nvidia.com/GTC/PDF/GTC2012/... · S0630-31 23 Migration Process, Tool View CPU Code Analysis GPU Programming

Agenda CPU Threads Flip Queue CPU Queues GPU Hardware Queue

CMPT454 GPU Managed Database · GPGPU: General Purpose GPU, using GPU for usual CPU usage. Outline 1. GPU VS CPU 2. GPU Implementation 3. Products 4. Future Holds. GPU VS CPU. GPU

CUDA and GPGPU Computing · 2020-04-07 · Spring 2019 CS4823/6643 Parallel Processing 2 GPGPU Programming As GPU is a drastically different from CPU, programming on GPU requires

Extending Unified Parallel C for GPU Computing · PGAS Programming Model for Hybrid Multi-Core Systems Computer Node CPU Memory GPU GPU Memory CPU CPU GPU GPU Memory Computer Node