16
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dr. Dušan B. Gajić [email protected] CIITLab, Dept. of Computer Science Faculty of Electronic Engineering University of Niš, Serbia 15 th Workshop on Software Engineering Education and Reverse Engineering Bohinj, Slovenia, August 23 - 30, 2015

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Embed Size (px)

Citation preview

Page 1: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Program Optimizations and Recent Trends in Heterogeneous Parallel

ComputingDr. Dušan B. Gajić

[email protected], Dept. of Computer Science

Faculty of Electronic EngineeringUniversity of Niš, Serbia

15th Workshop on Software Engineering Education and Reverse Engineering

Bohinj, Slovenia, August 23 - 30, 2015

Page 2: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Outline

1. Heterogeneous parallel computing

2. Computing model, trends, and optimizations

3. Case study: Hybrid CPU/GPU computation of the Galois field transform

4. Conclusions

Part I

Part II

Page 3: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Heterogeneous Parallel Computing

Switch to heterogeneous systems is a milestone in high-performance computing (HPC) and computing in general

Homogeneous computing – one or more processors of the same architecture used to execute programs

Heterogeneous computing – a set of different processor architectures (CPUs, GPUs, FPGAs, DSPs) used to execute programs

Each processor intended for different tasks and, therefore, is based on different design philosophy

Applying each task to the best-suited architecture leads to improved performance in terms of time and energy, but requires novel programming techniques

GPUs have become dominant additions to CPUs – GPU computing or General-purpose computing on GPUs (GPGPU)

Page 4: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

CPU and GPU Throughput

2006 2007 2008 2009 2010 2011 2012 2013 20140

1000

2000

3000

4000

5000

6000

43 51 55 58 86 187 225 225 225518 576 648

1062

1581

2488

3090

4500

5632

CPU GPU

Year

Th

rou

gh

pu

t [G

FLO

PS

]

Page 5: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

CPU and GPU Bandwidth

2006 2007 2008 2009 2010 2011 2012 2013 20140

50

100

150

200

250

300

350

1026 26 32 32 32

51 51 51

90108

142159

177192 192

288

336

CPU GPU

Year

Ban

dw

idth

[G

B/s

]

Page 6: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Computing Model

1

2

Device executes kernels with high

parallelism

3

4

input

output

input buffer

output

buffer

New programming methods

Page 7: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Trends

More computation for less energy

Integration of CPU and GPU to avoid the PCIe bottleneck

Appereance of Intel Xeon Phi, Adapteva Parallela…

Architecture Fermi Kepler Maxwell

CUDA cores 512 1536 2048

Frequency (GHz) 1.5 1.0 1.1

SP per SM 32 192 128

Bandwidth (GBs/s) 192 288 336

TDP (W) 244 195 165

Page 8: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Program Optimizations

High throughput at the price of extended latency covered by very large number of threads

Focus on memory optimizations Memory transfers – coalesced access, page-locked memory… Explicit memory region definition Registers per thread – all SPs on an SM share the same register field

Size of the computation grid (no. of threads/block, no. blocks/grid…)

Problem of task scheduling gets more complicated – tasks are (optimally) scheduled to the best-suited processor

Parallel programming patterns (scatter/gather, map/reduce, stencil…)

Page 9: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

CIITLab has a small group performing research on spectral methods and linear algebra on GPUs(Fourier and related transforms, matrix computations…)

Heterogenous parallel computing @ FEE:Bachelor • Digital Signal Processing – VII

semester• Pattern Recognition - VIII semesterMaster• Heterogenous Methods for DSP• Spectral Methods

FEE founded in 1960, part of University of Niš with 28.000 students

Faculty of Electronic Engineering Niš

480 students in the common 1st year, 180 major in CS (from the 2nd year)

Page 10: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Case Study – Hybrid CPU/GPU computation of GFT

Galois field expressions are a generalization of the Reed-Muller expressions from binary to the multiple-valued logic case

:{0,1,..., 1} {0,1,..., 1}nf p p [ (0), (1),..., ( 1)]n Tf f f p F

[ (0), (1),..., ( 1)]n Tf f f fs s s p S f

transform

matrix

S F

Perform computation on different parts of the vector in parallel on the CPU and the GPU

Basic idea: Different parts of the same task on different processors?

2( )O N

Fast algorithms based on the factorization of the transfom matrix into sparse matrices ( log )O N N

Page 11: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

𝐆4𝐺𝐹 (1)=[1 0 0 00 1 3 20 1 2 31 1 1 1

]𝐂1=𝐆4𝐺𝐹 (1)⊗𝐈

¿

Transform matrix for GF(4):

Cooley-Tukey factorization:

Page 12: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

¿

Page 13: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

6 7 8 9 10 11 12 130.05

0.50

5.00

50.00

500.00

C++ FUN

C++ LUT

MPI

OpenCL

CUDA

MPI/OpenCL

MPI/CUDA

Number of variables (n)

Pro

cess

ing

tim

e [

ms]

“Pure” MPI is the fastest for small functions (n ≤ 6)MPI/CUDA the fastest for all functions with (n ≥ 7)

Use of page-locked memory is 3× faster that standard read-write

Page 14: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Speedup of MPI/CUDA over CUDAn = 11 4 %n = 12 16,3 %n = 13 19,9 %

GF(4)

Number of variables

11 12 13C++ FUN 439 1987 8820C++ LUT 317 1453 6501

MPI 40 181 815OpenCL 31 125 598CUDA 26 107 446

MPI/OpenCL 25 104 430MPI/CUDA 25 92 372

Experimental Results

Speedup over CUDA increases with the size of the functionSpeedup is limited by memory transfers and division granularity

Page 15: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Conclusions

Heterogeneous parallel computing is both the present and the future of high-performance computing(at least until we have something better )

Switch to heterogeneous systems is a milestone in computing

Case study – hybrid CPU/GPU computation of GF(4) expressionsChallenges in education for heterogeneous parallel computing:

Making a shift in thinking from the“CPU-only” programming Adjusting course topics to address recent developments

More computation for less energy with heterogeneous processors integrated around common memory

Page 16: Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing

Dušan Gajić, University of Niš

Thank you for your time and attention!

Dr. Dušan B. Gajić[email protected]

CIITLab, Dept. of Computer ScienceFaculty of Electronic Engineering

University of Niš, Serbia

15th Workshop on Software Engineering Education and Reverse Engineering

Bohinj, Slovenia, August 23 - 30, 2015