16
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston

OpenMP in a H eterogeneous W orld

  • Upload
    chase

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

OpenMP in a H eterogeneous W orld. Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston. Top 10 Supercomputers (June 2011). Why OpenMP. Shared memory parallel programming model Extends C, C++. Fortran Directives-based - PowerPoint PPT Presentation

Citation preview

Page 1: OpenMP  in a  H eterogeneous  W orld

OpenMP in a Heterogeneous World

Ayodunni AribukiAdvisor: Dr. Barbara Chapman

HPCTools GroupUniversity of Houston

Page 2: OpenMP  in a  H eterogeneous  W orld

2

Top 10 Supercomputers (June 2011)

Page 3: OpenMP  in a  H eterogeneous  W orld

3

Why OpenMP• Shared memory parallel programming model

– Extends C, C++. Fortran• Directives-based

– Single code for sequential and parallel version• Incremental parallelism

– Little code modification• High-level

– Leave multithreading details to compiler and runtime• Widely supported by major compilers

– Open64, Intel, GNU, IBM, Microsoft, …– Portable

www.openmp.org

Page 4: OpenMP  in a  H eterogeneous  W orld

4

OpenMP Example

#pragma omp parallel{ int i;#pragma omp for for(i=0;i<100;i++){ //do stuff } //do more stuff}

0-24 25-49

50-74

75-99

Implicit barrierMore

stuff

More

stuff

More

stuff

More

stuff

Fork

Join

Page 5: OpenMP  in a  H eterogeneous  W orld

5

Present/Future Architectures & Challenges they pose

Node 0

Memory

Node 1

Node 2 Node 3

Memory

Memory Memory

accelerator

Memory

Many more CPUS

Location

Heterogeneity

Scalability

Node 0

Memory

Node 1

Node 2 Node 3

Memory

Memory Memory

Page 6: OpenMP  in a  H eterogeneous  W orld

6

Heterogeneous Embedded Platform

Page 7: OpenMP  in a  H eterogeneous  W orld

7

Heterogeneous High-Performance Systems

Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as

GPUs.www.olcf.ornl.gov/wp-content/uploads/.../Exascale-ASCR-Analysis.pdf

Page 8: OpenMP  in a  H eterogeneous  W orld

8

• Must map data/computations to specific devices

• Usually involves substantial rewrite of code• Verbose code– Move data to/from device x– Launch kernel on device– Wait until y is ready/done

• Portability becomes an issue– Multiple versions of same code– Hard to maintain

Programming Heterogeneous Multicore:Issues

Always hardware-specific!

Page 9: OpenMP  in a  H eterogeneous  W orld

9

Programming Models? Today’s Scenario

// Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) {

// Block and grid dimensions dim3 dimBlock(12,12);kernel<<<1,dimBlock>>>(); cudaThreadExit();

} else {

printf("Device error on %s\n",processor_name);}

MPI_Finalize(); return 0;

}

www.cse.buffalo.edu/faculty/miller/Courses/CSE710/heavner.pdf

Page 10: OpenMP  in a  H eterogeneous  W orld

10

OpenMP in the Heterogeneous World• All threads are equal– No vocabulary for heterogeneity, separate device

• All threads must have access to the memory– Distributed memories common in embedded systems– Memories may not be coherent

• Implementations rely on OS and threading libraries– Memory allocation, synchronization e.g. Linux,

Pthreads

Page 11: OpenMP  in a  H eterogeneous  W orld

11

Extending OpenMP Example

#pragma omp parallel for target(dsp) for(j=0;i<m;i++) for (i=0;i<n,i++) c(i,j)=a(i,j)+b(i,j)

Main Memor

y

Application data

General Purpose

Processor Cores

HWA

Application data

Device cores

Upload remote

data

Download remote

data

Remote Procedure

call

Page 12: OpenMP  in a  H eterogeneous  W orld

12

Heterogeneous OpenMP Solution Stack

OpenMP Application

Directives, Compiler

OpenMP library

Environment

variables

Runtime library

OS/system support for shared memory

OpenMP Parallel Computing Solution Stack

Use

r la

ye r

Prog

. la

yer

Ope

nMP

AP

I

Syst

em

laye

r

Core 1 Core 2 Core n…MCAPI, MRAPI, MTAPI

• Language extensions

• Efficient code generation

12

• Target Portable Runtime Interface

Page 13: OpenMP  in a  H eterogeneous  W orld

13

Summarizing My Research• OpenMP on heterogeneous architectures– Expressing heterogeneity– Generating efficient code for GPUs/DSPs• Managing memories

– Distributed– Explicitly managed

– Enabling portable implementations

Page 14: OpenMP  in a  H eterogeneous  W orld

14

Backup

Page 15: OpenMP  in a  H eterogeneous  W orld

15

MCA: Generic Multicore Programming

• Solve portability issue in embedded multicore programming

• Defining and promoting open specifications for– Communication - MCAPI– Resource Management - MRAPI– Task Management - MTAPI

(www.multicore-association.org)

Page 16: OpenMP  in a  H eterogeneous  W orld

16

Heterogeneous Platform: CPU + Nvidia GPU