By: Matthew RoyleSupervisor: Prof. Shaun Bangay
Multi-core CPUs Sequential algorithms to parallel
algorithms GPUs used for more than just graphics Use of GPGPUs (General-Purpose
Graphics Processing Unit)
Parallel programming languages for
specific architectures, namely NVIDIA’s
CUDA Lack of a multi-platform open language The OpenCL (Open Computing Language)
standard Heterogenous Parallel Programming
Parallel nature of GPUs No Implementation Implement OpenCL using existing
technologies
High level translator Use Parallel Frameworks
GPU most likely form of implementation
NVIDIA and AMD plan to include OpenCL
Future Apple iPhones
Lack of implementation on CPU
architecture
Select a parallel processing framework
Create a high level translator Create valid tests Run created tests
_kernel int add_vect (); //create computation unit
cl_cmd_queue cmd_queue = CreateCommandQueue(); //create computation queue
clEnqueueTask(kernel,i); //enqueue task and execute
cl_cmd_queue CreateCommandQueue(){ return cmd_queue[]; }
void clEnqueueTask(kernel,i) { cmd_queue[i] = kernel; }
#pragma omp parallel for{for(int k = 0; k < cmd_queue.length; k++)
Execute(cmd_queue[k]);}
John Conway’s Game Of Life
Fractal Flame algorithm
OpenMP (Open Multi-Processing) framework
Parallel Processing Framework
Available with the GNU Compiler
Collection Free! OpenCL header files
/* scalar types */
typedef int8_t cl_char;
typedef uint8_t cl_uchar;
typedef int16_t cl_short __attribute__((aligned(2)));
typedef uint16_t cl_ushort __attribute__((aligned(2)));
typedef int32_t cl_int __attribute__((aligned(4)));
typedef uint32_t cl_uint __attribute__((aligned(4)));
typedef int64_t cl_long __attribute__((aligned(8)));
typedef uint64_t cl_ulong __attribute__((aligned(8)));
typedef uint16_t cl_half __attribute__((aligned(2)));
typedef float cl_float __attribute__((aligned(4)));
typedef double cl_double __attribute__((aligned(8)));
//hello.c
#include <omp.h>#include <stdio.h>int main() {#pragma omp parallel num_threads(10)printf("Hello from thread %d, nthreads %d\n",
omp_get_thread_num(), omp_get_num_threads());}
Improve performance
Evaluation of OpenCL on various
Architectures
Heterogenous execution
Lack of multi-platform open language
OpenCL standard
Most implementations for GPU
Implementation for CPU
High Level Translator
Use OpenMP framework