Parallel computing with Gpu

Rohit khatana

Parallel Computing With GPU

Rohit Khatana4344

Seminar guideProf. Aparna Joshi

ARMY INSTITUE OF TECHNOLOGY

Rohit khatana

Content

1.What is parallel computing?

3.CUDA

4.Application

Rohit khatana

What is Parallel Computing?

Performing or Executing a task/program on more than one machine or processor.

In simple way dividing a job in a group.

Rohit khatana

For example

Rohit khatana

What kind of processors will we build?

(major design constraint: power)

Cpu: - Complex Control Hardware

Flexibility + Performance

Expensive in Terms of Power

GPU: - Simpler Control Hardware

More H/W for Computation

Potentially More power Efficient (ops/watt)

More Restrictive Programming Model

Modern GPU has more ALU’s

Graphics Logical Pipeline• The GPU receives geometry information

from the CPU as an input and provides a picture as an output

• Let’s see how that happens

Host Interface

• The host interface is the communication bridge between the CPU and the GPU

• It receives commands from the CPU and also pulls geometry information from system memory

• It outputs a stream of vertices in object space with all their associated information (normals, texture coordinates, per vertex color etc)

Vertex Processing• The vertex processing stage receives vertices from the

host interface in object space and outputs them in screen space

• This may be a simple linear transformation, or a complex operation involving morphing effects

• No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)

Triangle Setup

• In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output)

• Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected

Triangle Setup

• A fragment is generated if and only if its center is inside the triangle

• Every fragment generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle

Fragment Processing

• Each fragment provided by triangle setup is fed into fragment processing as a set of attributes

(position, normal, texcoord etc), which are used to compute the final color for this pixel

• The computations taking place here include texture mapping and math operations

Memory Interface

• Fragments provided by the last step are written to the framebuffer.

• Before the final write occurs, some fragments are rejected by the zbuffer, stencil and alpha tests

Memory Model of GPU

Basic Architecture of GPU

CUDA(compute unified device Architecture)

• CUDA is a parallel computing platform and programming model.

• Created by NVIDIA and implemented by the GPUs that they produce.

• CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.

• CUDA supports standard programming languages , including C++,python , Fortran.

Programming Model

• Threads are organized into blocks.

• Blocks are organized into a grid.

• A multiprocessor executes one block at a time.

• A warp is the set of threads executed in parallel.

• 32 threads in a warp.

Typical CUDA/GPU Program

1. CPU allocates storage on GPU (cudaMalloc).

2. CPU copies input data from CPU GPU (cudaMemcpy).

3. CPU launches kernel on GPU to process the data.(Kernel function<<<no of threads>>>(parameter))

4. CPU copies results back to CPU from GPU (cudaMemcpy)

simply squaring the elements of an array

__global__ void square(float * d_out, float * d_in){

// Todo: Fill in this function

int idx = threadIdx.x;

float f = d_in[idx];

d_out[idx] = f*f

theadIdx.x =gives the current thread number

GPU/CUDA programming

Main program

int main(int argc, char **argv){

……………………

…………………….

float h_out[ARRAY_SIZE];

//declare GPU pointer

float * d_in;

float * d_out;

// allocate GPU memory

cudaMalloc( (void*) &d_in, ARRAY_BYTES);

cudaMalloc( (void*) &d_out, ARRAY_BYTES);

Main program(cont.)

// transfer the array to the GPU

cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);

// launch the kernel

square<<<1, ARRAY_SIZE>>>(d_out, d_in);

// copy back the result array to the CPU

cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);

// print out the resulting array

for (int i =0; i < ARRAY_SIZE; i++) {

printf("%f", h_out[i]);

Programming Model

GPU vs CPU Code

Conclusion• GPU computing is a good choice for fine-

grained data-parallel programs with limited communication

• GPU computing is not so good for coarse-grained program with a lot of communication

• The GPU has become a co-processor to the CPU.

References

• 1.[‘IEEE’] Accelerating image processing capability using graphics processors Jason. Dalea, Gordon. Caina, Brad. ZellbaVision4ce Ltd. Crowthorne Enterprise Center, Crowthorne, Berkshire, UK, RG45 6AWbVision4ce LLC Severna Park, USA, MD2114

• 2.Udacity cs344,Intro to parallel Programming with GPU

• 3.Wikipedia

• 4.Nividia docs

Parallel computing with Gpu

Technology

COMP 605: Introduction to Parallel Computing Homework 8 ...€¦ · COMP 605: Introduction to Parallel Computing Homework 8: "GPU/CUDA Wave" Generator Using Matrix-Matrix Multiplication

GPU Computing Gems - Elsevier · Each GPU Computing Gems volume offers a snapshot of the state of parallel computing across a carefully selected subset of industry domains, giving

David Luebke NVIDIA Research GPU Computing: The Democratization of Parallel Computing

GPU Computing: Data-Parallel Algorithms · GPU Computing: Data-Parallel Algorithms Dipl.-Ing. Jan Nov´ak Dipl.-Inf. Gabor Liktor´ y Prof. Dr.-Ing. Carsten Dachsbacherz Abstract

Research Article A GPU-Based Parallel Procedure …downloads.hindawi.com/journals/mpe/2013/618980.pdfimplement GPU code. CUDA is a parallel computing platform and program-ming model

Data Parallel Computing on Graphics Hardwareianbuck/GH03_datapargfx.pdf · GPU: Data Parallel – Each fragment shaded independently • No dependencies between fragments – Temporary

PARALLEL COMPUTING USING GPU FOR EFFICIENT TRAFFIC … · 2018-07-10 · Parallel Computing can be made possible using the multiple cores of the Graphics Processing Unit (GPU) thanks

NVIDIA CUDA Software and GPU Parallel Computing Architecture · NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist

GPU Computing and Its Applications · 2016-05-10 · GPU Computing is a type of several of computing – that is, parallel computing with multiple processor architectures. In GPU

NVIDIA CUDA Software and GPU Parallel Computing Architecturekr.nvidia.com/content/cudazone/download/showcase/... · NVIDIA CUDA Software and GPU Parallel Computing Architecture David

Parallel Computing Notes Topic: Notes on Hybrid MPI + GPU

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Accelerated Computing

ptx isa 14 - Nvidia · instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device. 1.1. Scalable Data-Parallel Computing Using GPUs

Extending Unified Parallel C for GPU Computing · PGAS Programming Model for Hybrid Multi-Core Systems Computer Node CPU Memory GPU GPU Memory CPU CPU GPU GPU Memory Computer Node

Parallel Programming Concepts GPU Computing with OpenCL

GPU Computing

Parallel Computing: Perspectives for more efficient ...greganagno.com/download/Presentations/GA_internalSeminar_10:2011.… · General Concepts GPU Programming CA Parallel implementation

GPU Computing: The Democratization of Parallel Computing

Department of Applied Mathematics · Processing Neural networks. Environmental and Geophysical Fluid Dynamics Faculty members: 4 ... Parallel computing and GPU computing ... Graduate

Massive Parallel Computing Programming on GPU