OpenMPI Majdi Baddourah

OpenMPI

Majdi Baddourah

https://computing.llnl.gov/tutorials/mpi/

What is OpenMP• An Application Program Interface (API) that may be used to

explicitly direct multi-threaded, shared memory parallelism • Comprised of three primary API components:

– Compiler Directives – Runtime Library Routines – Environment Variables

• Portable: – The API is specified for C/C++ and Fortran

• Standardized: – Jointly defined and endorsed by a group of major computer

hardware and software vendors • What does OpenMP stand for?

– Open Multi-Processing

C / C++ - General Code Structure #include <omp.h>main () {int var1, var2, var3;Serial code .Beginning of parallel section. Fork a team of threads.Specify variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads . All threads join master thread and disband } Resume serial code .}

Fork - Join Model

C / C++ Directives Format

#pragma omp directive-name [clause, ...] newline

Required for all OpenMP C/C++ directives.

A valid OpenMP directive. Must appear after the pragma and before any clauses.

Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted.

Required. Precedes the structured block which is enclosed by this directive

#pragma omp parallel default(shared) private(beta,pi)

Parallel Region Example

#include <omp.h> main () { int nthreads, tid; /* Fork a team of threads with each thread having a private tid variable */#pragma omp parallel private(tid) { /* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and terminate */}

Work-Sharing Construct

DO / for - shares iterations of a loop across the team. Represents a type of "data parallelism".

SECTIONS - breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism".

SINGLE - serializes a section of code

Work-Sharing Constructs DO / for Directive

#pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) collapse (n) nowait for_loop

For Directive Example#include <omp.h>#define CHUNKSIZE 100#define N 1000main () {int i, chunk;float a[N], b[N], c[N]; /* Some initializations */for (i=0; i < N; i++) a[i] = b[i] = i * 1.0;chunk = CHUNKSIZE;#pragma omp parallel shared(a,b,c,chunk) private(i) { #pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i]; } /* end of parallel section */}

SECTIONS Directive

•The SECTIONS directive is a non-iterative work-sharing construct. It specifies that the enclosed section(s) of code are to be divided among the threads in the team.

•Independent SECTION directives are nested within a SECTIONS directive. Each SECTION is executed once by a thread in the team. Different sections may be executed by different threads. It is possible that for a thread to execute more than one section if it is quick enough and the implementation permits such.

SECTIONS Directive

#pragma omp sections [clause ...] newline private (list) firstprivate (list)

lastprivate (list) reduction (operator: list) nowait

{ #pragma omp section newline structured_block #pragma omp section newline structured_block }

SECTIONS Directive #include <omp.h>#define N 1000 main (){ int i;float a[N], b[N], c[N], d[N]; /* Some initializations */for (i=0; i < N; i++) { a[i] = i * 1.5; b[i] = i + 22.35; }#pragma omp parallel shared(a,b,c,d) private(i) { #pragma omp sections nowait { #pragma omp section for (i=0; i < N; i++) c[i] = a[i] + b[i]; #pragma omp section for (i=0; i < N; i++) d[i] = a[i] * b[i]; } /* end of sections */ } /* end of parallel section */ }

CRITICAL Construct

#include <omp.h> main(){ int x;x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; } /* end of parallel section */ }

THREADPRIVATE Directive

• The THREADPRIVATE directive is used to make global file scope variables (C/C++) or common blocks (Fortran) local and persistent to a thread through the execution of multiple parallel regions

Threadprivate Directive Exam

ple #include <omp.h> int a, b, i, tid;float x;#pragma omp threadprivate(a, x) main () {/* Explicitly turn off dynamic threads */ omp_set_dynamic(0); printf("1st Parallel Region:\n");#pragma omp parallel private(b,tid) { tid = omp_get_thread_num(); a = tid; b = tid; x = 1.1 * tid +1.0; printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x); } /* end of parallel section */ printf("************************************\n"); printf("Master thread doing serial work here\n"); printf("************************************\n"); printf("2nd Parallel Region:\n");#pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x); } /* end of parallel section */}

Results

1st Parallel Region:Thread 0: a,b,x= 0 0 1.000000Thread 2: a,b,x= 2 2 3.200000Thread 3: a,b,x= 3 3 4.300000Thread 1: a,b,x= 1 1 2.100000************************************Master thread doing serial work here************************************2nd Parallel Region:Thread 0: a,b,x= 0 0 1.000000Thread 3: a,b,x= 3 0 4.300000Thread 1: a,b,x= 1 0 2.100000Thread 2: a,b,x= 2 0 3.200000

Threadprivate Directive Exam

Reduction Clause Example #include <omp.h>main () {int i, n, chunk;float a[100], b[100], result;/* Some initializations */n = 100;chunk = 10;result = 0.0;for (i=0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; }#pragma omp parallel for \ default(shared) private(i) \ schedule(static,chunk) \ reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]);printf("Final result= %f\n",result);}

Clauses / Directives Summary

Environment Variables

• OMP_SCHEDULE • OMP_NUM_THREADS

– Setenv OMP_NUM_THREADS 5• OMP_DYNAMIC

– setenv OMP_DYNAMIC TRUE • OMP_NESTED • OMP_STACKSIZE • OMP_WAIT_POLICY • OMP_MAX_ACTIVE_LEVELS • OMP_THREAD_LIMIT

Compilers Flags

Compiler Flag

IBM -qsmp=omp

Intel -openmp

PathScale -mp

PGI -mp

GNU -fopenmp

Amdahl’s Law

int i; int j; int prime; int total = 0;# pragma omp parallel \ shared ( n ) \ private ( i, j, prime ) \ reduction ( + : total )# pragma omp for for ( i = 2; i <= n; i++ ) { prime = 1; for ( j = 2; j < i; j++ ) { if ( i % j == 0 ) { prime = 0; break; } } total = total + prime; }

total = 0.0; # pragma omp parallel shared ( a, b, n ) private ( i, x ) reduction ( + : total ) # pragma omp for

for ( i = 0; i < n; i++ ) { x = ( ( n - i - 1 ) * a + ( i ) * b ) / ( n - 1 ); total = total + f ( x ); }

OpenMPI Majdi Baddourah

Documents

MVAPICH2 vs. OpenMPI for a Clustering Algorithmgobbert/papers/BlasbergGobbertHPCF20087.pdf · MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobberty

Majdi MSc S2014

Majdi Ahmed Bamakhramah - KSUarts.ksu.edu.sa/sites/arts.ksu.edu.sa/files/imce... · Majdi Bamakhramah: Superheavy syllables and the concept iltiqāʔ as-sākinayn in traditional Arabic

TSH Level in Pregnancy Gita Majdi PGY4 Endocrinology Fellow May 2014

ACCRE SOFTWARE S - ACCRE Vanderbilt · GCC/5.4.0 Intel/2016.3.210 OpenMPI/1.10.3 IntelMPI/5.1.3.181 Amber/16 OpenMPI/1.10.3-RoCE Amber/16 Amber/16 Software is organized in a tree

Name Organization ASN Abbas, Majdi S. Yahoo! 10310 … · Name Organization ASN Abbas, Majdi S. Yahoo! 10310 Abdul, Jonathon NTT Communications 2914 Abshire, Elizabeth Juniper Networks

Parallel Visualization of Large Scale Vector-Fieldschengu/Teaching/Fall2012/Lecs/... · Distributed computing: 20-50 nodes. Data partitioning may be needed (OpenMPI). Cloud Environment:

Guided Conversational Agents and Knowledge Trees for Natural Language Interfaces to Relational Databases Mr. Majdi Owda, Dr. Zuhair Bandar, Dr. Keeley

Word Affix Project of Younis and Majdi

Majdi Ashour, MD, MPH Public Health and Health Policy Specialist

Consolidated Annual Performance and Evaluation Report€¦ · Honorable Cameron Runyan, At-Large Honorable Moe Baddourah, District 3 Steven A. Gantt City Manager Teresa Wilson Assistant

CUDA, OpenMPI, OpenMP Basics - UB Computer Science and Engineering

PROJECT NAME: “Mechanical System for school of optical” (An-Najah university College) Prep ared by: Abdulla Alawneh Islam Salahat Mohammed Jaber Majdi

Bright Computing - UGentkehoste/eum18/Bright_Computing...CafffeOnSpark static 0.1 Spark 2.x CNTK static 2.3.1 OpenMPI, MKL Keras N/A 2.1.2 Theano, CNTK and Tensorflow backends MXNet

Omid Majdi - New Noise Group

Mr. Majdi Owda, Dr. Zuhair Bandar, Dr. Keeley Crockett

HPC Parallel Programinghpc.iucaa.in/Documentation/hpc_training/tcs/HPC... · • Collective communication • One to all, All to one & All to All ... • Open Source – OpenMPI,

Majdi Ali AL-Mahasneh,Ph - fac.ksu.edu.safac.ksu.edu.sa/sites/default/files/Curriculum_vitae_up_to_date_0_0.pdf · Applying solar energy to sesame roasting 7. Powder milk production

IJCG-Majdi-Hassani-Nasiri's Published Paper

Insulin Pump in Pregnancy Gita Majdi, MD, MRCP (UK), FRCPC, ABIM May 2015