1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication

1

CS4402 – Parallel Computing

Lecture 2

MPI – Getting Started.

MPI – Point to Point Communication.

2

What is MPI? M P I = Message Passing Interface

An Interface Specification: MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a library - but rather the specification of what such a library should be.

Simply stated, the goal of the Message Passing Interface is to provide a widely used standard for writing message passing programs.

The interface attempts to be: practical, portable, efficient, flexible.

Interface specifications have been defined for Fortran, C/C++ and Java programs.

3

Some History: MPI resulted from the efforts of numerous individuals and groups over the course of a 2

year period between 1992 and 1994.

1980s - early 1990s: Recognition of the need for a standard arose.

April, 1992: The basic features essential to a standard message passing interface were discussed, and a working group established to continue the standardization process. Preliminary draft proposal developed subsequently.

November 1992: MPI draft proposal (MPI1) from ORNL presented. Group adopts procedures and organization to form the MPI Forum.

November 1993: Supercomputing 93 conference - draft MPI standard presented.

May 1994: Final version of MPI 1 was released.

1996-1998: Developed MPI2.

4

Programming Model: SPMD

MPI lends itself to most (if not all) distributed memory parallel programming models.

Distributed Memory: Originally, MPI was targeted for distributed memory systems.

Shared Memory: As shared memory systems became more popular, MPI implementations for these platforms appeared.

Hybrid: MPI is now used on just about any common parallel architecture including massively parallel machines, SMP clusters, workstation clusters and heterogeneous networks.

All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs.

The number of tasks dedicated to run a parallel program is static. New tasks cannot be dynamically spawned during run time. (MPI-2 addresses this issue).

5

C Coding – Recalling Some facts

Structure of a C program:

1. Include all headers

2. Declare all functions

3. Define all functions including main

Simple Facts:

1. Declarations take the first part of a block

2. Same syntax for statements

3. Various important headers stdio, stdlib etc

6

MPI - Getting Started

MPI Header: Required for all programs/routines which make MPI library calls.

# include “mpi.h”

Sometime we have to include some other MPI libraries.

MPI Functions:

Format: rc = MPI_Xxxxx(parameter, ... )

Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm)

Error code: Returned as "rc". MPI_SUCCESS if successful

7

MPI - Getting Started

General MPI Program Structure:

include declarations (MPI header)the main function

- initialise the MPI environment- get the MPI basic elements: size, rank, etc- do the parallel work

- acquire the local data for processor rank- perform the computation of the data

-terminate the MPI environment.some other functions

8

MPI Programs

1. #include <stdio.h>2. #include "mpi.h"

3. int main( int argc, char* argv[])4. {5. int rank, size;6.7.8. MPI_Init( &argc, &argv );

9. MPI_Comm_size( MPI_COMM_WORLD, &size );10. MPI_Comm_rank( MPI_COMM_WORLD, &rank );

1. // the parallel computation of processor rank2. // get the data from somewhere 3. // process the data for processor rank

4. MPI_Finalize();5. return 0;6. }

9

Environment Management RoutinesIntialize/terminate the comm, find information about it, etc.MPI_Init Initializes the MPI execution environment.

MPI_Init (&argc, &argv) where argc and argv are the arguments of main()

MPI_Abort Terminates all MPI processes associated with the communicator. MPI_Abort (comm,errorcode)

MPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor. MPI_Wtime ()

MPI_Finalize Terminates the MPI execution. MPI_Finalize ().

10

Environment Management Routines

MPI_Comm_size Determines the number of processes from a communicator. MPI_Comm_size (comm,&size)

MPI_Comm_rank Determines the rank of the calling process within the communicator. MPI_Comm_rank (comm,&rank)

MPI_Get_processor_name Returns the processor name. MPI_Get_processor_name (&name,&resultlength)

11

Communicators: MPI_COMM_WORLD

Communicators: set of processors that communicate each other.

MPI routines require a communicator.

MPI_COMM_WORLD the default communicator.

Within a communicator each process has a rank.

12

Hello World

1. #include <stdio.h>2. #include "mpi.h"

3. int main( int argc, char* argv[])4. {5. int rank, size;6. int i,namelen;7. char processor_name[MPI_MAX_PROCESSOR_NAME];8.9. MPI_Init( &argc, &argv );

10. MPI_Comm_size( MPI_COMM_WORLD, &size );11. MPI_Comm_rank( MPI_COMM_WORLD, &rank );

MPI_Get_processor_name(processor_name,&namelen);12. printf("called on %s\n",processor_name);

13. printf( "Hello World from process %d of %d\n", rank, size );

14. MPI_Finalize();15. return 0;16.}

13

[sabin@cuc100 hello]$ lshellos hellos.c hellos.o Makefile[sabin@cuc100 hello]$ mpirun -np 4 helloscalled on cuc100.ucc.iecalled on cuc104.ucc.iecalled on cuc106.ucc.iecalled on cuc108.ucc.ieHello world from process 0 of 4Hello world from process 2 of 4Hello world from process 1 of 4Hello world from process 3 of 4

14

Simple Structures – All Processors Work

#include <stdio.h>#include "mpi.h"

int main( int argc, char* argv[]){

int rank, size;int i,namelen;double time;

MPI_Init( &argc, &argv );MPI_Comm_size( MPI_COMM_WORLD, &size );MPI_Comm_rank( MPI_COMM_WORLD, &rank );

// describe the processing to be done by processor rank// 1. identify the local data by using rank// 2. process the local data

MPI_Finalize();return 0;

}

15

Case Study: Count Prime Numbers

Some facts about the lab program:

The first numbers 2*i+1 are tested for i = 0, 1, 2, ..., n-1

With size processors Each gets n/size numbers to test

Block partition of the odd numbers onto processors: Proc 0: 0, 1, 2, 3, ..., n/size-1

Proc 1: n/size, n/size+1,..., 2*n/size-1

Proc 2: 2*n/size, 2*n/size+1,..., 3*n/size-1

Proc rank gets: rank*(n/size),..., (rank+1)*n/size-1

16

Count Primes



int rank, size, i, count = 0;double time;


time=MPI_Wtime();for(i=rank*n/size;i<(rank+1)*n/size;i++)

if(isPrime(2*i+1)) count++time=MPI_Wtime()-time;printf(“Processor %d finds %d primes in %lf\n", rank, count, time);


}

LOCAL DATA

17

Count Primes

Cyclic partition of the odd numbers onto processors:

Proc 0: 0, size, 2*size, ...

Proc 1: 1, size+1, 2*size+1, ...

Proc rank gets: rank, rank+size,...,

for(i=rank; i<n; i+=size)

if(isPrime(2*i+1) count++

18

Simple Structures – One Processor Works



int rank, size;int i,namelen;double time;


if(rank==0){

// describe the processing to be done by processor 0}


}

SERIAL PART

19

P2P Communication

MPI P2P operations involve message passing between two different MPI

processors.

The sender should have MPI_Send and the receiver a MPI_Recv.

The code should look like

if(rank==sender || rank == receiver)

{ if (rank==sender) MPI_Send(…);

else if (rank==receiver) MPI_Recv();

}

20

Different types of send and receive routines for different purposes.

Blocking send / blocking receive; Synchronous send; Non-blocking send / non-blocking receive; Buffered

send; Combined send/receive; "Ready" send.

Blocking: methods return only after the operation has been done successfully.

Synchronous: Blocking plus handshaking.

Non-Blocking: methods return immediately but must be backed up by MPI wait or

test

- non blocking call

- do some other computation

- wait or test the call completion.

Types of P2P Communication

21

Blocking

22

Non-Blocking

23

Synchronous

24

Envelope Details

The P2P operations should have envelope details.Blocking send: MPI_Send(buffer, count, type, dest, tag, comm) Blocking receive: MPI_Recv(buffer, count, type, source, tag, comm, status)

Non-blocking send: MPI_Isend(buffer, count, type, dest, tag, comm, request) Non-blocking receive: MPI_Irecv(buffer, count, type, source, tag, comm, request)

buffer - the address of the messagecount - number of elements to be sent dest - the process destination tag - the message tag/id; wild card MPI_ANY_TAG comm - the communicatorstatus - general info about the message

25

MPI Data Types

MPI_CHAR signed char MPI_SHORT signed short int MPI_INT signed int MPI_LONG signed long int

MPI_UNSIGNED_CHAR unsigned charMPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double MPI_LOGICAL logical

26

Basic Blocking Operations

MPI_Send – Basic send routine returns only after the application buffer in the sending task is free for reuse. MPI_Send (&buf,count,datatype,dest,tag,comm)

MPI_Recv - Receive a message and block until the requested data is available. MPI_Recv (&buf,count,datatype,source,tag,comm,&status)

MPI_Ssend - Synchronous blocking send: MPI_Ssend (&buf,count,datatype,dest,tag,comm,ierr)

MPI_Bsend - Buffered blocking sendMPI_Bsend (&buf,count,datatype,dest,tag,comm)

MPI_Rsend - Blocking ready send. MPI_Rsend (&buf,count,datatype,dest,tag,comm)

Similar MPI_?recv can be analyzed.

27

Basic Non-Blocking OperationsMPI_Isend – immediate send operation that should be followed by MPI_Wait or MPI_Test.

MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irecv - immediate receive

MPI_Irecv (&buf,count,datatype,source,tag,comm,&request) MPI_Issend – immediate synchronous send. MPI_Wait() or MPI_Test() indicates when the

destination process has received the message. MPI_Issend (&buf,count,datatype,dest,tag,comm,&request)

MPI_Ibsend - Non-blocking buffered send. MPI_Ibsend (&buf,count,datatype,dest,tag,comm,&request)

MPI_Irsend Non-blocking ready send. MPI_Irsend (&buf,count,datatype,dest,tag,comm,&request)

MPI_Test - MPI_Test checks the status of a specified non-blocking send or receive operation.

28

Simple Ping-Pong Example

29


30


Ping-Pong computation works with the following elements:

- Only two processors involved in; the rest are idle.

- Processor rank1 does:

- 1. Prepare the message.

- 2. Send the message to Processor 2.

- 3. Receive the message from Processor 2.

- Processor rank2 does:

- 1. Prepare the message.

- 2. Receive the message to Processor 1.

- 3. Send the message from Processor 2.

31

// MPI program to ping-pong between Processor 0 and Processor 1

#include "mpi.h" #include <stdio.h>

int main(int argc, char * argv []) int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg; MPI_Status Stat ;

MPI_Init (&argc,&argv); MPI_Comm_size (MPI_COMM_WORLD, &numtasks); MPI_Comm_rank (MPI_COMM_WORLD, &rank);

if (rank == 0) { dest = source = 1;outmsg=’x’; rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); } else if (rank == 1) { dest = source = 0;outmsg=’y’; rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }

rc = MPI_Get_count (&Stat, MPI_CHAR, &count);printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE,Stat.MPI_TAG); MPI_Finalize ();}

32

[sabin@cuc100 pingpong]$ make/usr/local/mpich/bin/mpicc -c pingpong.c/usr/local/mpich/bin/mpicc -o pingpong pingpong.o -lm

[sabin@cuc100 pingpong]$ mpirun -np 2 pingpongTask 0 received the char x Task 0: Received 1 char(s) from task 1 with tag 1 Task 1 received the char y Task 1: Received 1 char(s) from task 0 with tag 1

33

All-to-Root as P2P Communication

34

All-to-Root as P2P Communication

All-to-root computation involves:

- The processor rank sends the message to root.

- If Processor 0 then :

- for size times do

- Receive the message from Processor source.

Overall execution time computation?

35

#include <stdio.h> #include "mpi.h"

int main(int argc, char** argv) {

int rank; /* Rank of process */ int size; /* Number of processes */

int source, dest, int tag = 50; MPI_Status status; /* Return status for receive */

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);

sprintf(message, "Greetings from process %d!", rank); dest = 0; /* Use strlen(message)+1 to include '\0' */ MPI_Send(message, strlen(message)+1, MPI_CHAR, 0, tag, MPI_COMM_WORLD);

if(rank == 0){

for (source = 0; source < size; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message);

}

} MPI_Finalize();

}

36

[sabin@cuc100 fireP0]$ make /usr/local/mpich/bin/mpicc -c fire.c/usr/local/mpich/bin/mpicc -o fire fire.o -lm

[sabin@cuc100 fireP0]$ mpirun -np 5 fireHere it is Process 0 Greetings from process 1! Greetings from process 2! Greetings from process 3! Greetings from process 4!

37

Ring Communication

How can the processors all know a variable?

How many values they have to know?

How to achieve this? circular process

Each processor repeats for size times:

Send the value to the right

Receive a value from left

Store the value or process the value

38

Ring Communication

a

b

d

c

ef

39

Ring Communication

a,f

b,a

d,c

c,b

e,df,e

40

# include <stdio.h># include “mpi.h”# define tag 100

int main (int argc, char *argv[]){

int ierror, rank, size; int right, left; int ibuff, obuff, sum, i;MPI_Status recv_status;

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size);

right = rank + 1; if (right == size) right = 0; left = rank - 1; if (left == -1) left = size-1;

sum = 0; obuff = rank;

for( i = 0; i < size; i++) { MPI_Send(&obuff, 1, MPI_INT, right, tag, MPI_COMM_WORLD); MPI_Recv(&ibuff, 1, MPI_INT, left, tag, MPI_COMM_WORLD, &recv_status); // storebuff[(rank-i)%n] = obuff; sum = sum + ibuff; obuff = ibuff; } printf ("\t Processor %d: \t Sum = %d\n", rank, sum); MPI_Finalize();}

41

[sabin@cuc100 ring]$ make/usr/local/mpich/bin/mpicc -c ring.c/usr/local/mpich/bin/mpicc -o ring ring.o -lm

[sabin@cuc100 ring]$ mpirun -np 5 ring Processor 0: Sum = 10 Processor 1: Sum = 10 Processor 3: Sum = 10 Processor 4: Sum = 10 Processor 2: Sum = 10

42

References:1. LLNL MPI Tutorial – Sections on P2P communication.2. Wilkinson Book – Sections on P2P Communication.

Documents

1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication