CUDA GPU Computing

Advisor： Cho-Chin Lin

Student ： Chien-Chen Lai

Outline

Introduction and Motivation

What is driving the many-cores?

Quadro FX 5600

NV35 NV40

G70G70-512

Tesla C870

3.0 GHzCore 2 Quad3.0 GHz

Core 2 Duo3.0 GHz Pentium 4

GeForce8800 GTX

Jan 2003 Jul 2003 Jan 2004 Jul 2004 Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007

Design philosophies are different.

ALUControl

CPU GPU

The GPU is specialized for compute-intensive, massively data parallel computation (exactly what graphics rendering is about).

So, more transistors can be devoted to data processing rather than data caching and flow control

CPU VS. GPU

Jamie and Adam demonstrate the difference between a CPU and GPU.

This is not your advisor’s parallel computer! Significant application-level speedup over

uni-processor executionNo more “killer micros”

Easy entrance An initial, naïve code typically get at least 2-

3X speedup

This is not your advisor’s parallel computer! Wide availability to end users

available on laptops, desktops, clusters, super-computers

Numerical precision and accuracy IEEE floating-point and double precision

Historic GPGPU Constraints

Input Registers

Fragment Program

Output Registers

Constants

Texture

Temp Registers

per threadper Shaderper Context

FB Memory

Dealing with graphics API Working with the corner cases of

the graphics API Addressing modes

Limited texture size/dimension Shader capabilities

Limited outputs Instruction sets

Lack of Integer & bit ops Communication limited

No interaction between pixels No scatter store ability - a[i] = p

CUDA - No more shader functions. CUDA integrated CPU+GPU application C program

Serial or modestly parallel C code executes on CPU Highly parallel SPMD kernel C code executes on GPU

CPU Serial CodeGrid 0

GPU Parallel Kernel

KernelA<<< nBlk, nTid >>>(args);

Grid 1CPU Serial Code

GPU Parallel Kernel

KernelB<<< nBlk, nTid >>>(args);

CUDA for Multi-Core CPU A single GPU thread is too small for a CPU Thread

CUDA emulation does this and performs poorly CPU cores designed for ILP, SIMD

Optimizing compilers work well with iterative loops Turn GPU thread blocks from CUDA into iterative CPU loops

CUDA Grid

GPU CPU

Compiler

CUDA for Multi-Core CPU

Application C on single core CPU

CUDA on 4-core CPU

Speedup*

CUDA on G80

MRI-FHD ~1000s 230s ~4x 8.5s

CP 180s 45s 4x .28s

SAD 42.5ms 25.6ms 1.66x 4.75ms

MM (4Kx4K) 7.84s** 15.5s 3.69x 1.12s

CUDA GPU Computing

Documents

CUDA-Based GPU Computing Framework for GNU Octavedeveloper.download.nvidia.com/GTC/PDF/GTC2012/Posters/P... · 2012-05-09 · CUDA-Based GPU Computing Framework for GNU Octave Inspired

GPU Computing and CUDA

NVIDIA GPU Computing Webinars CUDA Memory Optimization

Parallel Hybrid Computing · GPU GPU GPU GPU OpenMP HMPP MPI CUDA. Programming Multicores/ ... CILK, TBB, automatic parallelization, vectorization… • Distributed memory architectures

GPU Computing with Nvidia CUDA - Northeastern University€¦ · GPU Computing with Nvidia CUDA 1 Analogic Corp. 4/14/2011 David Kaeli, Perhaad Mistry, Rodrigo Dominguez, ... binary

GPU Computing with CUDA Lecture 3 - Efficient …GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso,

GPU Computing with CUDA - University of …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2014/2_Advanced.pdfGPU Computing with CUDA ... Shared Memory Use (Dot Product, Matrix Multiplication)

GPU Computing: The Democratization of Parallel Computingskadron/cuda_asplos08_tutorial/1-Intro-overvie… · GPU Computing with CUDA brings data-parallel computing to the masses Over

CUDA-Based GPU Computing Framework for GNU Octaveon-demand.gputechconf.com/gtc/2012/posters/P0213...CUDA-Based GPU Computing Framework for GNU Octave Inspired by Jacket from AccelerEyes

GPU Programming and CUDA Sathish Vadhiyar High Performance Computing

[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics

NVIDIA CUDA Software and GPU Parallel Computing Architecture · NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist

GPU Computing with CUDA Lecture 1 - Introduction › pasi › files › 2011 › 07 › Lecture1.pdf · GPU Computing with CUDA Lecture 1 - Introduction Christopher Cooper ... Graphic

NVIDIA CUDA Software and GPU Parallel Computing Architecturekr.nvidia.com/content/cudazone/download/showcase/... · NVIDIA CUDA Software and GPU Parallel Computing Architecture David

GPU Computing with CUDA Lecture 3 - Efficient Shared ...GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso,

GPU Computing with CUDA

GPU Computing with CUDA - univ-reims.frcosy.univ-reims.fr/~cjaillet/www/pub/fichiers/enseignement/Info...GPU Computing with CUDA ... • ATI Stream by AMD • CUDA by NVIDIA • OpenCL

GPU Computing with CUDA Lecture 6 - CUDA Libraries - Thrust · GPU Computing with CUDA Lecture 6 - CUDA Libraries - Thrust Christopher Cooper Boston University August, 2011 UTFSM,

Introduction To CUDA · GPU and CUDA • Popular – Over 100 million CUDA enabled GPU sold • Easy to program using CUDA – C and C++ Integration – Sizeable computing libraries

NVIDIA TESLA GPU COMPUTING REVOLUTIONIZING HIGH … · 2012-05-09 · NVIDIA TESLA GPUS ARE REVOLUTIONIZING COMPUTING CUDA PARALLEL COMPUTING ARCHITECTURE CUDA™ is NVIDIA’s parallel