36
May 4, 2005 Programming Multicores with Pthreads and OpenMP Nikos P. Pitsianis [email protected] Xiaobai Sun Bo Zhang

Programming Multicores with Pthreads and OpenMP - Duke University

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Programming Multicores with Pthreads and OpenMP - Duke University

May 4, 2005

Programming Multicores with

Pthreads and OpenMP

Nikos P. Pitsianis [email protected]

Xiaobai Sun Bo Zhang

Page 2: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Outline

•  Programming with Threads

–  Embarrassingly Parallel (Pleasantly Parallel) –  Critical Sections (Mutual Exclusion) –  Data Dependent Task Parallelism (Condition Variables & Signals)

•  Quick Introduction to OpenMP Programming

Sep 29, 2010 Multicore Programming Workshop 2

Page 3: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

What is a thread?

•  Process: •  a program that is running •  an address space with 1 or more threads executing within the same

address space, and the required system resources for those threads

•  Thread: •  a sequence of control within a process •  shares the resources in that process

•  We cover here Posix Threads (Pthreads) –  widely supported threads programming API

•  Compile with “gcc  -­‐pthread”    –  This also forces the compiler to link in thread-safe libraries

Sep 29, 2010 3 Multicore Programming Workshop

Page 4: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Process

•  Process: a program in running

•  a single address space •  one or more threads executing

within that address space

•  required system resources for those threads

•  Each process can have multiple

threads, even on a single-core processor

Sep 29, 2010 Multicore Programming Workshop 4

Page 5: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010

Threads

•  Thread: a sequence of control within a process

•  All threads per process share:

•  memory (program code and global data)

•  open file/socket descriptors •  signal handlers and signal

dispositions •  working environment •  Threads communicate using

shared memory

5

Multicore Programming Workshop

Page 6: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Advantages and Disadvantages

•  Advantages:

•  creating a thread is significantly faster than creating a process •  switching between threads is faster than switching between

processes •  writing multithreaded programs is easier

•  Disadvantages :

•  writing multithreaded programs is harder •  more difficult to debug than single threaded programs

Sep 29, 2010 6 Multicore Programming Workshop

Page 7: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Outline

•  Programming with Threads

–  Embarrassingly Parallel (Pleasantly Parallel) –  Critical Sections (Mutual Exclusion) –  Data Dependent Task Parallelism (Condition Variables & Signals)

•  Quick Introduction to OpenMP Programming

Sep 29, 2010 Multicore Programming Workshop 7

Page 8: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 0 sequential code, as default single thread

#include <stdlib.h>!#include <stdio.h>!!void getvec (double *a);!!double dotprod (double *a, double *b, int n) {! int i;! double s = 0.0;! for ( i = 0; i < n; i++ ) ! s += a[i]*b[i];! return s;!}!!int main () {! double *a, *b;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! double dp = dotprod(a,b,N);! printf("%f\n", dp);!}!!

Sep 29, 2010 Multicore Programming Workshop 8

Source: www.cs.duke.edu/~nikos/mpw/dp0.c

Compile:

gcc –D N=1024 –O4 dp0.c –o dp0

Run: ./dp0

Page 9: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 1 sequential code as a separate thread

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!void getvec (double *a);!double dotprod (double *a, double *b, int n);!!typedef struct {! double *a, *b;! int n;!} dparg;!!void *wrapper (void *arg) {! double *ap, *bp, s;! int nn;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;!! s = dotprod(ap, bp, nn);! printf("%f\n", s);!}!!!

Sep 29, 2010 Multicore Programming Workshop 9

Source: www.cs.duke.edu/~nikos/mpw/dp1.c

Compile:

gcc –pthread –D N=1024 –O4 dp1.c –o dp1

Run: ./dp1

Page 10: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 1 sequential code as a separate thread

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!void getvec (double *a);!double dotprod (double *a, double *b, int n);!void *wrapper (void *arg);! !int main () {!double *a, *b;! pthread_t thread;! dparg arg;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! arg.a = a;! arg.b = b;! arg.n = n;!! pthread_create (&thread, NULL, wrapper, (void *)

&arg);! pthread_join (thread, NULL);!}!!! Sep 29, 2010 Multicore Programming Workshop 10

Source: www.cs.duke.edu/~nikos/mpw/dp1.c

Compile:

gcc –pthread –D N=1024 –O4 dp1.c –o dp1

Run: ./dp1

Page 11: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Thread Creation & Termination

pthread_create( pthread_t * tid, const pthread_attr_t * attr, void *(*func)(void *), void * arg);

func is the function to be called. When func() returns the thread is terminated

Sep 29, 2010 Multicore Programming Workshop 11

Page 12: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Thread Creation Arguments

•  Arguments are passed to thread library by creating a structure and passing the address of the structure

•  Thread attributes can be set using a*r, –  Joinable or detached state –  scheduling policy –  NULL for system defaults

Sep 29, 2010 Multicore Programming Workshop 12

Page 13: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010

Thread Lifespan

•  Once a thread is created

–  it starts executing the function func() –  func)) is an argument passed to pthread_create()

•  The thread is terminated

–  when func() returns, or –  by pthread_exit()

•  All threads are terminated

–  when main() returns or –  any thread calls exit()

13 Multicore Programming Workshop

Page 14: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010

Joinable and Detached State

•  Each thread can be either joinable or detached.

•  Joinable: –  on its termination the thread ID and exit status are saved

•  Detached: –  on its termination all resources used by the thread are released –  A detached thread cannot be joined

•  A thread can "join" another by calling pthread_join –  The caller blocks until a specified thread exits.

   int  pthread_join(  pthread_t  2d,    void  **status);  

14 Multicore Programming Workshop

Page 15: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 2 with multiple threads

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!typedef struct { double *a, *b, s; int n, tid;! } dparg;!!double dotprod (double *a, double *b, int n, int tid) {! int i;! double s = 0.0;! int block = n/NTHREADS;!! for ( i = tid*block; i < (tid+1)*block; i++) ! s += a[i]*b[i];! return s;!}!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;! tid = ((dparg *) arg)->tid;!! ((dparg *) arg)->s = dotprod(ap, bp, nn, tid);!}!!! !!!

Sep 29, 2010 Multicore Programming Workshop 15

Source: www.cs.duke.edu/~nikos/mpw/dp2.c

Compile:

gcc –pthread –D NTHREADS=8 –D N=1024 \ –O4 dp2.c –o dp2

Run: ./dp2

Page 16: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 2 with multiple threads

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!int main () {! double *a, *b, dp;! pthread_t thread[NTHREADS];! dparg arg[NTHREADS];! int i;! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);! getvec(a); getvec(b);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = n; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, ! (void *)&arg[i]);! }! dp = 0.0;! for (i=0; i<NTHREADS; i++) {! rc = pthread_join (thread[i], NULL);! dp += arg[i].s;! }! printf("%f\n", dp);!}!!! !!!

Sep 29, 2010 Multicore Programming Workshop 16

Source: www.cs.duke.edu/~nikos/mpw/dp2.c

Compile:

gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp2.c –o dp2

Run: ./dp2

Page 17: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Outline

•  Programming with Threads

–  Embarrassingly Parallel (Pleasantly Parallel)

–  Critical Sections (Mutual Exclusion)

–  Data Dependent Task Parallelism (Condition Variables & Signals)

•  Quick Introduction to OpenMP Programming

Sep 29, 2010 Multicore Programming Workshop 17

Page 18: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010

Mutual Exclusion

•  Mutual Exclusion primitives protect against races –  Read-Update-Write

•  Get the single key and –  lock the critical section of a program before accessing global

variables –  unlock as soon as you are done

pthread_mutex_t mux; pthread_mutex_init (&mux, NULL); pthread_mutex_lock (&mux); pthread_mutex_unlock (&mux);

18 Multicore Programming Workshop

Page 19: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Locking and Unlocking

•  To lock : pthread_mutex_lock(pthread_mutex_t &);

•  To unlock : pthread_mutex_unlock(pthread_mutex_t &);

•  Both functions are blocking

Sep 29, 2010 Multicore Programming Workshop 19

Page 20: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 3 with Critical Section

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!pthread_mutex_t dp_mtx;!double dp;!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;! tid = ((dparg *) arg)->tid;!! s = dotprod(ap, bp, nn, tid);!! pthread_mutex_lock(&dp_mtx);! dp += s;! pthread_mutex_unlock(&dp_mtx);!!}!!! !!!

Sep 29, 2010 Multicore Programming Workshop 20

Source: www.cs.duke.edu/~nikos/mpw/dp3.c

Compile:

gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp3.c –o dp3

Run: ./dp3

Page 21: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 3 with Critical Section

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!int main () {! double *a, *b, ps;! pthread_t thread[NTHREADS];! dparg arg[NTHREADS];! int i;!! getvec(a); getvec(b);!! dp = 0.0;! pthread_mutex_init (&dp_mtx, NULL);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = N; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, ! (void *)&arg[i]);! }! for (i=0; i<NTHREADS; i++) {! pthread_join (thread[i], NULL);! }! printf("%f\n", dp);!}!!! !!!

Sep 29, 2010 Multicore Programming Workshop 21

Source: www.cs.duke.edu/~nikos/mpw/dp3.c

Compile:

gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp3.c –o dp3

Run: ./dp3

Page 22: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Outline

•  Programming with Threads

–  Embarrassingly Parallel (Pleasantly Parallel)

–  Critical Sections (Mutual Exclusion)

–  Data Dependent Task Parallelism (Condition Variables & Signals)

•  Quick Introduction to OpenMP Programming

Sep 29, 2010 Multicore Programming Workshop 22

Page 23: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010

Condition Variables

•  Condition variables allow one thread to –  wait for (sleep until) an event generated by any other thread

•  This allows us to avoid the busy waiting

pthread_cond_t *notFull, *notEmpty; pthread_cond_init (q->notFull, NULL); pthread_cond_init (q->notEmpty, NULL); pthread_mutex_lock (fifo->mut); while (fifo->full) { printf ("producer: queue FULL.\n"); pthread_cond_wait (fifo->notFull, fifo->mut); } queueAdd (fifo, i); pthread_mutex_unlock (fifo->mut);

pthread_cond_signal (fifo->notEmpty);

23 Multicore Programming Workshop

Page 24: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Condition Variables

Condition variables are used with a mutex pthread_cond_wait(pthread_cond_t *cptr, pthread_mutex_t *mptr); pthread_cond_signal(pthread_cond_t *cptr);

Sep 29, 2010 Multicore Programming Workshop 24

Page 25: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 4 with Condition Variable

#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!pthread_cond_t notEmptyVecSignal;!pthread_mutex_t vec_mtx;!pthread_mutex_t dp_mtx;!double dp;!int emptyVec;!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! […]!! pthread_mutex_lock(&vec_mtx);! while (emptyVec) {! pthread_cond_wait(&notEmptyVecSignal,&vec_mtx);! }! pthread_mutex_unlock(&vec_mtx);!! s = dotprod(ap, bp, nn, tid);!! pthread_mutex_lock(&dp_mtx);! dp += s;! pthread_mutex_unlock(&dp_mtx);!}!!!! !!!

Sep 29, 2010 Multicore Programming Workshop 25

Source: www.cs.duke.edu/~nikos/mpw/dp4.c

Compile:

gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp4.c –o dp4

Run: ./dp4

Page 26: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example 4 with Condition Variable int main () {! [ … ]!! emptyVec = 1;! pthread_mutex_init (&vec_mtx, NULL);! pthread_cond_init (&notEmptyVecSignal, NULL);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = N; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, (void *)

&arg[i]);! }!! getvec(a); getvec(b);!! pthread_mutex_lock(&vec_mtx);! emptyVec = 0;! pthread_mutex_unlock(&vec_mtx);! pthread_cond_broadcast (&notEmptyVecSignal);!! for (i=0; i<NTHREADS; i++) {! rc = pthread_join (thread[i], NULL);! }!! printf("%f\n", dp);!}!!!! !!!

Sep 29, 2010 Multicore Programming Workshop 26

Source: www.cs.duke.edu/~nikos/mpw/dp4.c

Compile:

gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp4.c –o dp4

Run: ./dp4

Page 27: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Outline

•  Programming with Threads

–  Embarrassingly Parallel (Pleasantly Parallel)

–  Critical Sections (Mutual Exclusion)

–  Data Dependent Task Parallelism (Condition Variables & Signals)

•  Quick Introduction to Programming with OpenMP

Sep 29, 2010 Multicore Programming Workshop 27

Page 28: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

OpenMP

•  A set of compiler directives and library routines for parallel application programmers

•  OMP simplifies writing multi-threaded programs in Fortran, C and C++

•  Most of the constructs in OpenMP are compiler directives •  #pragma omp construct [clause [clause]…]

•  #pragma omp parallel num_threads(4) •  Function prototypes and types in the file:

–  #include <omp.h> •  Most OpenMP* constructs apply to a structured block

–  Structured block: a block of one or more statements with one point of entry at the top and one point of exit at the bottom

Sep 29, 2010 Multicore Programming Workshop 28

Page 29: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Example in OpenMP #include <omp.h>!#include <stdlib.h>!#include <stdio.h>!!void getvec (double *a);!!double dotprod (double *a, double *b, int n) {! int i;! double s = 0.0;!!#pragma omp parallel for reduction(+:s)! for ( i = 0; i < n; i++ ) ! s += a[i]*b[i];! return s;!}!!int main () {! double *a, *b;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! omp_set_num_threads(NTHREADS);! double dp = dotprod(a,b,n);! printf("%f\n", dp);!}!!!! !!!

Sep 29, 2010 Multicore Programming Workshop 29

Source: www.cs.duke.edu/~nikos/mpw/dp0-omp.c

Compile:

gcc –D NTHREADS=8 –D N=1024 \ –fopenmp –O4 dp0-omp.c –o dp0-omp

Run:

./dp0-omp

Page 30: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 30

OpenMP Parallel Region

#pragma omp parallel [clause...] ! if (scalar_expression) ! private (list) ! shared (list) ! default (shared | none)! firstprivate (list) ! reduction (operator: list) ! copyin (list) !! num_threads (n)!

! structured_block!

•  When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team •  The master becomes thread

number 0 within that team.

•  The parallel region code is executed by all threads

•  A barrier implied at the end of the

parallel section •  Only the master thread

continues execution

•  If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined.

Page 31: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 31

OpenMP Work Sharing DO/for

#pragma omp for [clause...]! schedule (type [,chunk])! ordered private (list)! firstprivate (list)! lastprivate (list) ! shared (list) ! reduction (operator: list)! collapse (n) ! nowait !!for_loop !

#pragma omp parallel for \ ! shared(a,b,c) \! private(i)! for (i=0; i < n; i++) {! c[i] = a[i] + b[i]; ! } !

Page 32: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 32

Directive Responsibility

•  Work-sharing •  Data scoping •  Synchronization •  Scheduling

•  Parallel region: partition work –  Each thread executes same

code

•  Parallel for loop: partition iterations –  Threads share iterations of

loop

•  Parallel section: functional parallelism –  Threads perform different

tasks

Page 33: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 33

Directive Responsibility

•  Work-sharing •  Data scoping •  Synchronization •  Scheduling

•  Shared: threads access a single copy of the data object

•  Private: each thread gets volatile copy –  Firstprivate: initialized from

master –  Lastprivate: master’s copy

updated with last value of last thread

Page 34: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 34

Directive Responsibility

•  Work-sharing •  Data scoping •  Synchronization •  Scheduling

#pragma omp master

{…}

#pragma omp critical {…}

#pragma omp atomic count++;

#pragma omp barrier reduction (+: sum)

•  Shared data with concurrent access lead to corrupted data

•  Synchronization •  Mutex – ensures exclusive

access to critical section of code

•  Barrier – causes a group of threads to pause until all have reached a defined point

•  Signaling •  Conditional Wait – waits for

some event; signals when it occurs

•  Broadcasting – signals a group of waiting threads

Page 35: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

Sep 29, 2010 Multicore Programming Workshop 35

Directive Responsibility

•  Work-sharing •  Data scoping •  Synchronization •  Scheduling

•  Static: splits iteration space into blocks of size chunk

•  Dynamic: assign blocks to threads as they become idle (uneven workloads)

•  Guided: adjusts chunk-size exponentially until all assigned

Page 36: Programming Multicores with Pthreads and OpenMP - Duke University

Duke University

References

•  D. Butenhof, Programming with POSIX threads, Addison Wesley (1997)

•  Online Tutorials from LLNL –  https://computing.llnl.gov/tutorials/pthreads/ –  https://computing.llnl.gov/tutorials/openMP/

Sep 29, 2010 36 Multicore Programming Workshop