View
212
Download
0
Category
Tags:
Preview:
Citation preview
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Program Optimizations and Recent Trends in Heterogeneous Parallel
ComputingDr. Dušan B. Gajić
dusan.b.gajic@gmail.comCIITLab, Dept. of Computer Science
Faculty of Electronic EngineeringUniversity of Niš, Serbia
15th Workshop on Software Engineering Education and Reverse Engineering
Bohinj, Slovenia, August 23 - 30, 2015
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Outline
1. Heterogeneous parallel computing
2. Computing model, trends, and optimizations
3. Case study: Hybrid CPU/GPU computation of the Galois field transform
4. Conclusions
Part I
Part II
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Heterogeneous Parallel Computing
Switch to heterogeneous systems is a milestone in high-performance computing (HPC) and computing in general
Homogeneous computing – one or more processors of the same architecture used to execute programs
Heterogeneous computing – a set of different processor architectures (CPUs, GPUs, FPGAs, DSPs) used to execute programs
Each processor intended for different tasks and, therefore, is based on different design philosophy
Applying each task to the best-suited architecture leads to improved performance in terms of time and energy, but requires novel programming techniques
GPUs have become dominant additions to CPUs – GPU computing or General-purpose computing on GPUs (GPGPU)
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
CPU and GPU Throughput
2006 2007 2008 2009 2010 2011 2012 2013 20140
1000
2000
3000
4000
5000
6000
43 51 55 58 86 187 225 225 225518 576 648
1062
1581
2488
3090
4500
5632
CPU GPU
Year
Th
rou
gh
pu
t [G
FLO
PS
]
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
CPU and GPU Bandwidth
2006 2007 2008 2009 2010 2011 2012 2013 20140
50
100
150
200
250
300
350
1026 26 32 32 32
51 51 51
90108
142159
177192 192
288
336
CPU GPU
Year
Ban
dw
idth
[G
B/s
]
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Computing Model
1
2
Device executes kernels with high
parallelism
3
4
input
output
input buffer
output
buffer
New programming methods
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Trends
More computation for less energy
Integration of CPU and GPU to avoid the PCIe bottleneck
Appereance of Intel Xeon Phi, Adapteva Parallela…
Architecture Fermi Kepler Maxwell
CUDA cores 512 1536 2048
Frequency (GHz) 1.5 1.0 1.1
SP per SM 32 192 128
Bandwidth (GBs/s) 192 288 336
TDP (W) 244 195 165
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Program Optimizations
High throughput at the price of extended latency covered by very large number of threads
Focus on memory optimizations Memory transfers – coalesced access, page-locked memory… Explicit memory region definition Registers per thread – all SPs on an SM share the same register field
Size of the computation grid (no. of threads/block, no. blocks/grid…)
Problem of task scheduling gets more complicated – tasks are (optimally) scheduled to the best-suited processor
Parallel programming patterns (scatter/gather, map/reduce, stencil…)
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
CIITLab has a small group performing research on spectral methods and linear algebra on GPUs(Fourier and related transforms, matrix computations…)
Heterogenous parallel computing @ FEE:Bachelor • Digital Signal Processing – VII
semester• Pattern Recognition - VIII semesterMaster• Heterogenous Methods for DSP• Spectral Methods
FEE founded in 1960, part of University of Niš with 28.000 students
Faculty of Electronic Engineering Niš
480 students in the common 1st year, 180 major in CS (from the 2nd year)
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Case Study – Hybrid CPU/GPU computation of GFT
Galois field expressions are a generalization of the Reed-Muller expressions from binary to the multiple-valued logic case
:{0,1,..., 1} {0,1,..., 1}nf p p [ (0), (1),..., ( 1)]n Tf f f p F
[ (0), (1),..., ( 1)]n Tf f f fs s s p S f
transform
matrix
S F
Perform computation on different parts of the vector in parallel on the CPU and the GPU
Basic idea: Different parts of the same task on different processors?
2( )O N
Fast algorithms based on the factorization of the transfom matrix into sparse matrices ( log )O N N
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
𝐆4𝐺𝐹 (1)=[1 0 0 00 1 3 20 1 2 31 1 1 1
]𝐂1=𝐆4𝐺𝐹 (1)⊗𝐈
¿
Transform matrix for GF(4):
Cooley-Tukey factorization:
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
¿
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
6 7 8 9 10 11 12 130.05
0.50
5.00
50.00
500.00
C++ FUN
C++ LUT
MPI
OpenCL
CUDA
MPI/OpenCL
MPI/CUDA
Number of variables (n)
Pro
cess
ing
tim
e [
ms]
“Pure” MPI is the fastest for small functions (n ≤ 6)MPI/CUDA the fastest for all functions with (n ≥ 7)
Use of page-locked memory is 3× faster that standard read-write
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Speedup of MPI/CUDA over CUDAn = 11 4 %n = 12 16,3 %n = 13 19,9 %
GF(4)
Number of variables
11 12 13C++ FUN 439 1987 8820C++ LUT 317 1453 6501
MPI 40 181 815OpenCL 31 125 598CUDA 26 107 446
MPI/OpenCL 25 104 430MPI/CUDA 25 92 372
Experimental Results
Speedup over CUDA increases with the size of the functionSpeedup is limited by memory transfers and division granularity
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Conclusions
Heterogeneous parallel computing is both the present and the future of high-performance computing(at least until we have something better )
Switch to heterogeneous systems is a milestone in computing
Case study – hybrid CPU/GPU computation of GF(4) expressionsChallenges in education for heterogeneous parallel computing:
Making a shift in thinking from the“CPU-only” programming Adjusting course topics to address recent developments
More computation for less energy with heterogeneous processors integrated around common memory
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing
Dušan Gajić, University of Niš
Thank you for your time and attention!
Dr. Dušan B. Gajićdusan.b.gajic@gmail.com
CIITLab, Dept. of Computer ScienceFaculty of Electronic Engineering
University of Niš, Serbia
15th Workshop on Software Engineering Education and Reverse Engineering
Bohinj, Slovenia, August 23 - 30, 2015
Recommended