67
Introduction to OpenMP Introduction to OpenMP

Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Embed Size (px)

Citation preview

Page 1: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Introduction to OpenMPIntroduction to OpenMP

Page 2: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP IntroductionOpenMP Introduction

Page 3: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Credits:www.sdsc.edu/~allans/cs260/lectures/OpenMP.ppt

www.mgnet.org/~douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Homepage: http://www.openmp.org/

Page 4: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Module ObjectivesModule Objectives

• Introduction to the OpenMP standard

• After completion, users should be equipped to implement OpenMP constructs in their applications and realize performance improvements on shared memory machines

Page 5: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

DefinitionDefinition

Parallel Computing:

• Computing multiple things simultaneously.

• Usually means computing different parts of the same problem simultaneously.

• In scientific computing, it often means decomposing a domain into more than one sub-domain and computing a solution on each sub-domain separately and simultaneously (or almost separately and simultaneously).

Page 6: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

• Perfect (a.k.a Embarrassing, Trivial) Parallelism • Monte-Carlo Methods• Cellular Automata

• Data Parallelism• Domain Decomposition• Dense Matrix Multiplication

• Task Parallelism• Pipelining• Monte-Carlo?• Cellular Automata?

Types of ParallelismTypes of Parallelism

Page 7: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Performance MeasuresPerformance Measures

• Peak Performance: Theoretical upper bound on performance.

• Sustained Performance: Highest consistently achievable speed.

• MHz: Million cycles per second. • MIPS: Million instructions per second.• Mflops: Million floating point operations per second. • Speedup: Sequential run time divided by parallel run time.

Page 8: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Parallelism IssuesParallelism Issues

• Programming notation

• Algorithms and Data Structures

• Load Balancing

• Problem Size

• Communication

• Portability

• Scalability

Page 9: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Getting your feet wetGetting your feet wet

Page 10: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Memory TypesMemory Types

CPU

Memory

CPU

Memory

CPU

Memory

CPU

Memory Memory

CPU CPU

CPUCPU

Distributed

Shared

Page 11: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Clustered SMPsClustered SMPssymmetric multiprocessors

Cluster Interconnect Network

Memory Memory Memory

Multi-socket and/or Multi-core

Page 12: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Distributed vs. Shared MemoryDistributed vs. Shared Memory

• Shared - all processors share a global pool of memory– simpler to program– bus contention leads to poor scalability

• Distributed - each processor physically has its own (private) memory– scales well– memory management is more difficult

Page 13: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

What is OpenMP?What is OpenMP?

• OpenMP is a portable, multiprocessing API for shared memory computers

• OpenMP is not a “language”

• Instead, OpenMP specifies a notation as part of an existing language (FORTRAN, C) for parallel programming on a shared memory machine

• Portable across different architectures

• Scalable as more processors are added

• Easy to convert sequential code to parallel

Page 14: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Why should I use OpenMP?Why should I use OpenMP?

Page 15: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Where should I use OpenMP?Where should I use OpenMP?

Page 16: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP SpecificationOpenMP Specification

OpenMp consists of three main parts:

• Compiler directives used by the programmer to communicate with the compiler• Runtime library which enables the setting and querying of parallel parameters

• Environmental variables that can be used to define a limited number of runtime parameters

Page 17: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Example UsageOpenMP Example Usage (1 of 2)

OpenMPCompiler

AnnotatedSource

SequentialProgram

ParallelProgram

compiler switch

Page 18: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Example UsageOpenMP Example Usage (2 of 2)

• If you give sequential switch,– comments and pragmas are ignored.

• If you give parallel switch,– comments and/or pragmas are read, and– cause translation into parallel program.

• Ideally, one source for both sequential and parallel program (big maintenance plus).

Page 19: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Simple OpenMP ProgramSimple OpenMP Program

• Most OpenMP constructs are compiler directives or pragmas• The focus of OpenMP is to parallelize loops• OpenMP offers an incremental approach to parallelism

Page 20: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Programming ModelOpenMP Programming Model

OpenMP is a shared memory model.• Workload is distributed among threads

– Variables can be• shared among all threads• duplicated and private to each thread

– Threads communicate by sharing variables• Unintended sharing of data can lead to race conditions:

– race condition: when the program’s outcome changes as the threads are scheduled differently.• To control race conditions:

– Use synchronization (Chapter Four) to protect data conflicts.• Careless use of synchronization can lead to deadlocks (Chapter Four)

Page 21: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Execution ModelOpenMP Execution Model

• Fork-join model of parallel execution

• Begin execution as a single process (master thread)• Start a parallel construct:

Master thread creates a team of threads

• Complete a parallel construct:Threads in the team wait until all team work has been completed

• Only master thread continues execution

Page 22: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

The Basic IdeaThe Basic Idea

Page 23: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP directive format in COpenMP directive format in C#pragma directives, defined by C standard as a mechanism to do

compiler-specific tasks e.g. ignore errors, generate special code#pragma must be ignored if not understood; thus, SOME OpenMP

programs can be compiled for sequential OR parallel executionTypically, OpenMP directives can be enabled by compiler option

• OpenMP pragma Usage:#pragma omp directive_name [ clause [ clause ] ... ] CR• Conditional compilation#ifdef _OPENMP

printf(“%d avail.processors\n”,omp_get_num_procs());

#endif• case sensitive• Include file for library routines:

#ifdef _OPENMP#include <omp.h>#endif

Page 24: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Microsoft Visual Studio OpenMP OptionMicrosoft Visual Studio OpenMP Option

Page 25: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Other CompilersOther Compilers

• Intel (icc, ifort, icpc)– -openmp

• PGI (pgcc, pgf90, …)– -mp

• GNU (gcc, gfortran, g++)– -fopenmp– need version 4.2 or later

Page 26: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP parallel region OpenMP parallel region constructconstruct

Block of code to be executed by multiple threads in parallel

Each thread executes the same code redundantly!• C/C++:#pragma omp parallel [ clause [ clause ] ... ] CR{

structured-block}• clause can be either or both of the following: private(comma-separated identifier-list) shared(comma-separated identifier-list)• If no private/shared list, shared is assumed for all

variables

Page 27: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP parallel region OpenMP parallel region constructconstruct

Page 28: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Communicating Among ThreadsCommunicating Among Threads

• Shared Memory Model– threads read and write shared variables

• no need for explicit message passing

– change storage attribute to private to minimize synchronization and improve cache reuse because private variables are duplicated in every team member

Page 29: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Storage Model – Data ScopingStorage Model – Data Scoping

• Shared memory programming model: variables are shared by default

• Global variables are SHARED among threads– C: file scope variables, static

• Private Variables:– exist only within the scope of each thread, i.e. they are

uninitialized and undefined outside the data scope– loop index variables– Stack variables in sub-programs called from parallel regions

Page 30: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP -- exampleOpenMP -- example

#include <stdio.h>

int main() {

// Do this part in parallel printf( "Hello, World!\n" );

return 0; }

Page 31: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP -- exampleOpenMP -- example

#include <stdio.h>#include <omp.h>

int main() {

omp_set_num_threads(16);

// Do this part in parallel #pragma omp parallel { printf( "Hello, World!\n" ); }

return 0; }

Page 32: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP environment variablesOpenMP environment variables

OMP_NUM_THREADS– sets the number of threads to use during execution– when dynamic adjustment of the number of threads is enabled,the value of this environment variable is the maximum numberof threads to use

setenv OMP_NUM_THREADS 16 [csh, tcsh] export OMP_NUM_THREADS=16 [sh, ksh, bash]

At runtime, omp_set_num_threads(6)

Page 33: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP runtime libraryOpenMP runtime library

omp_get_num_threads FunctionReturns the number of threads currently in the team executing the parallel region

from which it is called– C/C++:int omp_get_num_threads(void);

omp_get_thread_num FunctionReturns the thread number, within the team, that lies between 0 and

omp_get_num_threads()-1, inclusive. The master thread of the team is thread 0

– C/C++:int omp_get_thread_num(void);

Page 34: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Hello…WorldWorldWorldWorld!

Programming Model - Fork/JoinProgramming Model - Fork/Join

int main() {

// serial region

printf(“Hello…”);

// serial again

printf(“!”);

}

Fork

Join

// parallel region

#pragma omp parallel

{

printf(“World”);

}

Page 35: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Programming Model – Thread Programming Model – Thread IdentificationIdentification

Master Thread• Thread with ID=0• Only thread that exists in

sequential regions• Depending on

implementation, may have special purpose inside parallel regions

• Some special directives affect only the master thread (like master)

• Other threads in a team have ids 1..N-1

Fork

Join

0

0 1 2 3 4 5 6 7

0

Page 36: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Run-time Library: TimingRun-time Library: Timing

• There are 2 portable timing routines• omp_get_wtime

– portable wall clock timer returns a double precision value that is number of elapsed seconds from some point in the past

– gives time per thread - possibly not globally consistent– difference 2 times to get elapsed time in code

• omp_get_wtick– time between ticks in seconds

Page 37: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Loop ConstructsLoop Constructs

• Because the use of parallel followed by a loop construct is so common, this shorthand notation is often used (note: directive should be followed immediately by the loop)– #pragma parallel for [ clause [ clause ] ... ] CR

for ( ; ; ) { }

• Subsets of iterations are assigned to each thread in the team

Page 38: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Programming Model – Programming Model – Concurrent LoopsConcurrent Loops

• OpenMP easily parallelizes loops– No data dependencies between

iterations!

• Preprocessor calculates loop bounds for each thread directly from serial source ?

?

for( i=0; i < 25; i++ ) {

printf(“Foo”);

}

#pragma omp parallel for

Page 39: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Sequential Matrix MultiplySequential Matrix Multiply

for( i=0; i<n; i++ )for( j=0; j<n; j++ ) {

c[i][j] = 0.0;for( k=0; k<n; k++ )c[i][j] += a[i][k]*b[k][j];

}

Page 40: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP Matrix MultiplyOpenMP Matrix Multiply

#pragma omp parallel forfor( i=0; i<n; i++ )

for( j=0; j<n; j++ ) {c[i][j] = 0.0;for( k=0; k<n; k++ )

c[i][j] += a[i][k]*b[k][j];}

Page 41: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP parallel for directiveOpenMP parallel for directiveclause can be one of the following: private( list ) shared( list ) default( none | shared | private ) if (Boolean expression) reduction( operator: list) schedule( type [ , chunk ] ) nowait num_threads(N)

• Implicit barrier at the end of for unless nowait is specified• If nowait is specified, threads do not synchronize at the end of the parallel loop

schedule clause specifies how iterations of the loop are divided among the threads of the team.– Default is implementation dependent

Page 42: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP parallel/for directiveOpenMP parallel/for directive

#pragma omp parallel private(f)

{

f=7;

#pragma omp for

for (i=0; i<20; i++)

a[i] = b[i] + f * (i+1);

} /* omp end parallel */

// i is private

// a, b are shared

Page 43: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Default ClauseDefault Clause

• Note that the default storage attribute is DEFAULT (SHARED)

• To change default: DEFAULT(PRIVATE)– each variable in static extent of the parallel region is

made private as if specified by a private clause

– mostly saves typing

• DEFAULT(none): no default; must list storage attribute for each variable

USE THIS!

Page 44: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

If ClauseIf Clause

• if (Boolean expression)• executes (in parallel) normally if the

expression is true, otherwise it executes the parallel region serially

• Used to test if there is sufficient work to justify the overhead in creating and terminating a parallel region

Page 45: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Conditional Parallelism: Conditional Parallelism: ExampleExample

for( i=0; i<n; i++ )

#pragma omp parallel for if( n-i > 100 )

for( j=i+1; j<n; j++ )

for( k=i+1; k<n; k++ )

a[j][k] = a[j][k] - a[i][k]*a[i][j] / a[j][j]

Page 46: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Data modelData model

• Private and shared variables

•Variables in the global data space are accessed by all parallel threads (shared variables).

• Variables in a thread’s private space can only be accessed by the thread (private variables)

• several variations, depending on the initial values and whether the results are copied outside the region.

Page 47: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

#pragma omp parallel for private( privIndx, privDbl ) for ( i = 0; i < arraySize; i++ ) { for ( privIndx = 0; privIndx < 16; privIndx++ ) { privDbl = ( (double) privIndx ) / 16; y[i] = sin( exp( cos( - exp( sin(x[i]) ) ) ) ) +

cos( privDbl ); } }

Parallel for loop index isPrivate by default.

Page 48: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt
Page 49: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Reduction VariablesReduction Variables

#pragma omp parallel for reduction( op:list )

• op is one of +, *, -, &, ^, |, &&, or ||

• The variables in list must be used with this operator in the loop.

• The variables are automatically initialized to sensible values.

Page 50: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

The reduction clauseThe reduction clause

sum = 0.0;#pragma parallel for default(none) shared (n, x) reduction(+ : sum) for (int I=0; I<n; I++) sum = sum + x(I);

– A private instance of sum is allocated to each thread– Performs a local sum in each thread– Before terminating, each thread adds its local sum to the global

sum variable

Page 51: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Programming Model – Loop Programming Model – Loop SchedulingScheduling

• schedule clause determines how loop iterations are divided among the thread team– static([chunk]) divides iterations statically between

threads• Each thread receives [chunk] iterations, rounding as necessary to

account for all iterations

• Default [chunk] is ceil( # iterations / # threads )

– dynamic([chunk]) allocates [chunk] iterations per thread, allocating an additional [chunk] iterations when a thread finishes

• Forms a logical work queue, consisting of all loop iterations

• Default [chunk] is 1

– guided([chunk]) allocates dynamically, but [chunk] is exponentially reduced with each allocation

Page 52: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Loop schedulingLoop scheduling

Page 53: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Programming Model – Loop Programming Model – Loop SchedulingScheduling

for( i=0; i<16; i++ )

{

doIteration(i);

}

// Static Scheduling

int chunk = 16/T;

int base = tid * chunk;

int bound = (tid+1)*chunk;

for( i=base; i<bound; i++ )

{

doIteration(i);

}

#pragma omp parallel for \

schedule(static)

Page 54: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Programming Model – Loop Programming Model – Loop SchedulingScheduling

for( i=0; i<16; i++ )

{

doIteration(i);

}

// Dynamic Scheduling

int current_i;

while( workLeftToDo() )

{

current_i = getNextIter();

doIteration(i);

}

#pragma omp parallel for \

schedule(dynamic)

Page 55: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP sections directiveOpenMP sections directive

Several blocks are executed in parallel

• C/C++:

#pragma omp sections [ clause [ clause ] ... ] new-line

{

[#pragma omp section new-line ]

structured-block1

[#pragma omp section new-line

structured-block2 ]

...

}

Page 56: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP sections directiveOpenMP sections directive#pragma omp parallel{#pragma omp sections

{{ a=...;b=...; }

#pragma omp section{ c=...;d=...; }

#pragma omp section{ e=...;f=...; }

#pragma omp section{ g=...;h=...; }

} /*omp end sections*/} /*omp end parallel*/

Page 57: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

The omp sections clause - exampleThe omp sections clause - example

Page 58: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Threadprivate

• Private variables are private on a parallel region basis.

• Threadprivate variables are global variables that are private throughout the execution of the program.

Page 59: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

ThreadprivateThreadprivate

#pragma omp threadprivate( list )

Example: #pragma omp threadprivate( x)

• Requires program change in POSIX threads.

• Requires an array of size p.

• Access as x[pthread_self()].

• Costly if accessed frequently.

• Not cheap in OpenMP either.

Page 60: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

ThreadprivateThreadprivate

• Makes global data private to each thread– C: file scope and static variables

• Different from making them PRIVATE– with PRIVATE global scope is lost– THREADPRIVATE preserves global scope for

each thread

• Threadprivate variables can be initialized using COPYIN clause

Page 61: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt
Page 62: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Master structured blockMaster structured block• Only the master (0) thread executes the block • Rest of the team skips the section and continues execution

from the end of the master• No barrier at the end (or start) of the master section• The worksharing construct, OMP single is similar in

behavior but has an implied barrier at the end. Single is performed by any one thread.

• Syntax:– #pragma omp master

{……}

Page 63: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Ordered Structured BlockOrdered Structured Block

• Enclosed code is executed in the same order as would occur in sequential execution of the loop

• Directives:– #pragma omp ordered

{…..}

Page 64: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP synchronizationOpenMP synchronization

Implicit Barrier - all threads in a team wait for all threads to complete up to the barrier point

– beginning and end of parallel constructs– end of all other control constructs– barrier can be removed with nowait clause

• Explicit critical - only one thread at a time may execute a critical region

Page 65: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP critical directiveOpenMP critical directiveEnclosed code– executed by all threads, but– restricted to only one thread at a time• C/C++:#pragma omp critical [ ( name ) ] new-linestructured-block

• A thread waits at the beginning of a critical region until no otherthread in the team is executing a critical region with the same name.All unnamed critical directives map to the same unspecifiedname.

Page 66: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

OpenMP criticalOpenMP critical

C / C++: cnt = 0;f=7;#pragma omp parallel{#pragma omp for

for (i=0; i<20; i++) {if (b[i] == 0)

#pragma omp critical {

cnt ++;} /* endif */

a[i] = b[i] + f * (i+1);} /* end for */

} /*omp end parallel */

Page 67: Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Clauses by Directive TableClauses by Directive Table

https://computing.llnl.gov/tutorials/openMP