Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges

MPI from scratch. Part IPart I

By: Camilo A. Silva

BIOinformaticsBIOinformatics

Summer 2008

PIRE :: REU :: CyberbridgesPIRE :: REU :: Cyberbridges

Background

• Parallel Processing:– Separate workers or processes – Interact by exchanging information

• All use different data for each worker – Data-parallel

• Same operations on different data. Also called SIMD

– SPMD • Same program, different data

– MIMD • Different programs, different data

Types of Communication

• Processes:– Cooperative --- all parties agree to transfer

data – One sided --- one worker performs transfer of

data

Message Passing…

• Message-passing is an approach that makes the exchange of data cooperative.

One-sided…

• One-sided operations between parallel processes include remote memory reads and writes.

What is MPI?MESSAGE PASSING INTERFACE

• A message-passing library specification -- message-passing model -- not a compiler specification -- not a specific product

• For parallel computers, clusters, and heterogeneous networks

• Designed to permit the development of parallel software libraries

• Designed to provide access to advanced parallel hardware for -- end users -- library writers -- tool developers

Features of MPI• General

-- Communicators combine context and group for message security -- Thread safety

• Point-to-point communication -- Structured buffers and derived datatypes, heterogeneity -- Modes: normal (blocking and non-blocking), synchronous, ready (to allow access to fast protocols), buffered

• Collective -- Both built-in and user-defined collective operations -- Large number of data movement routines -- Subgroups defined directly or by topology

Features…

• Application-oriented process topologies -- Built-in support for grids and graphs (uses groups)

• Profiling -- Hooks allow users to intercept MPI calls to install their own tools

• Environmental -- inquiry -- error control

Features not in MPI

• Non-message-passing concepts not included: -- process management -- remote memory transfers -- active messages -- threads -- virtual shared memory

MPI’s essence

• A process is (traditionally) a program counter and address space.

• Processes may have multiple threads (program counters and associated stacks) sharing a single address space. MPI is for communication among processes, which have separate address spaces.

• Interprocess communication consist of– Synchronization– Movement of data from one process’s address space

to another’s

MPI Basics

• MPI can solve a wide range of problems using only six (6) functions:– MPI_INIT : Initiate an MPI computation.

MPI_FINALIZE : Terminate a computation. – MPI_COMM_SIZE : Determine number of

processes. – MPI_COMM_RANK : Determine my process

identifier. – MPI_SEND : Send a message. – MPI_RECV : Receive a message.

Function Definitions• MPI_INIT(int *argc, char ***argv)

– Initiate a computationargc, argv are required only in the C language binding, where they are the main

program’s arguments• MPI_FINALIZE()

– Shut down a computation• MPI_COMM_SIZE(comm, size)

– Determines the number of processes in a computation.IN comn communicator(handle)OUT size number of processes in the group of comm (integer)

• MPI_COMM_RANK(comm, pid)– Determine the identifier of the current processIN comm communicator (handle)OUT pid process id in the group of comm (integer)

MPI in Simple C Code

#include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello world\n" ); MPI_Finalize(); return 0; }

Function Definitions…• MPI_SEND(buf, count, datatype, dest, tag, comm)

– Send a message.IN buf address of send buffer (choice)IN count number of elements to send (integer >= 0)IN datatype datatype of send buffer elements (handle)IN dest process id of destination process (integer)IN tag message tag (integer)IN comm communicator (handle)

• MPI_RECV– Receive a message.IN buf address of receive buffer (choice)IN count size of receive buffer, in elements (integer >=

0)IN datatype datatype of receive buffer elements (handle)IN source process id of source process (integer)IN tag message tag (integer)IN comm communicator (handle)

MPI Common Terminology

• Processes can be collected into groups• Each message is sent in a context and must be

received in the same context• A group and a context together form a

communicator• A process is identified by its rank in the group

associated with a communicator• There is a default communicator whose group

contains all initial processes, called MPI_COMM_WORLD

MPI Basic Send and Receive

• To whom is data sent?

• What is sent?

• How does the receiver identify it?

Collective Communication

• Barrier: Synchronizes all processes. • Broadcast: Sends data from one process

to all processes. • Gather: Gathers data from all processes to

one process. • Scatter: Scatters data from one process to

all processes. • Reduction operations: Sums, multiplies,

etc., distributed data.

Collective Functions

1. MPI_BCAST to broadcast the problem size parameter ( size) from process 0 to all np processes;

2. MPI_SCATTER to distribute an input array ( work) from process 0 to other processes, so that each process receives size/np elements;

3. MPI_SEND and MPI_RECV for exchange of data (a single floating-point number) with neighbors;

4. MPI_ALLREDUCE to determine the maximum of a set of localerr values computed at the different processes and to distribute this maximum value to each process; and

5. MPI_GATHER to accumulate an output array at process 0.

MPI C Collective Code#include "mpi.h" #include <math.h> int main(argc,argv) int argc; char *argv[]; { int done = 0, n, myid, numprocs, i, rc; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x, a; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); while (!done) { if (myid == 0) { printf("Enter the number of intervals: (0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break; h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); } MPI_Finalize(); }

Follow-up

• Asynchronous Communication

• Modularity

• Data types + Heterogeneity

• Buffering Issues + Quality of Service (Qos)

• MPI Implementation: MPICH2 :: OpenMPI

• Parallel Program Structure for Project 18

References

• [1] http://www-unix.mcs.anl.gov/dbpp/text/node94.html• [2]

http://www-unix.mcs.anl.gov/mpi/tutorial/gropp/node23.html#Node23

• [3] http://www-unix.mcs.anl.gov/mpi/tutorial/gropp/talk.html#Node0

• [4] http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm

• [5] http://www-unix.mcs.anl.gov/dbpp/text/node1.html

http://www-unix.mcs.anl.gov/dbpp/text/node94.html



http://www-unix.mcs.anl.gov/mpi/tutorial/gropp/talk.html#Node0

http://www-unix.mcs.anl.gov/mpi/tutorial/gropp/talk.html#Node0

http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm

http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm

http://www-unix.mcs.anl.gov/dbpp/text/node1.html

Documents

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges