The design of a standard message passing interface for

L

3 4 4 5 6 0377877 4

T

ORNL/TM-12512

Engineering Physics and Mathematics Division iri; 1 ' 7'

Mathematical Sciences Section

THE DESIGN OF A STANDARD MESSAGE PASSING INTERFACE FOR

DISTRIBUTED MEMORY CONCURRENT COMPUTERS

David W. Walker

Mathematical Sciences Section Oak Ridge National Laboratory

P.O. Box 2008, Bldg. 6012 Oak Ridge, T N 37831-6367

Date Published: October 1993

Research was supported by the Advanced Research Projects Agency under contract DAAL03-9 1-C-0047, administered by the Army Research Office.

Prepared by the Oak Ridge National Laboratory

Oak Ridge, Tennessec: 37831 managed by

Martin Marietta Energy Systems, Inc. for the

U.S. DEPARTMENT OF ENERGY under Contract No. DEAC05-840R21400

3 4 4 5 6 0 3 7 7 8 7 4 4

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 An Overview of MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 DetailsofMPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Groups, Contexts, and Communicators . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.1 Process Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.2 Communication Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.3 Communicator Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Application Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Point-to-Point Communicatioli . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3.1 Message Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.2 General Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.3 Communication Completion . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.4 Persistent Communication Objects . . . . . . . . . . . . . . . . . . . . . 13

3.4 Collective Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4.1 Collective Data Movement Routines . . . . . . . . . . . . . . . . . . . . 14 3.4.2 Global Computation Routines . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

... . 111 .

THE DESIGN OF A STANDARD MESSAGE PASSING INTERFACE FOR

DISTRIBUTED MEMORY CONCURRENT COMPUTERS

David W. Walker

Abstract

This paper presents an overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of MPI has been a collective effort involving researchers in the United States and Europe from many organizations and institutions. MPI includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the MPI standard is based largely on current practice.

- v -

1. Introduction

This paper gives an overview of MPI, a proposed standard message passing interface for dis-

tributed memory concurrent computers. The main advantages of establishing a message pass-

ing interface for such machines are portability and ease-of-use, and a standard message passing

interface is a key component in building a concurrent computing environment in which appli-

cations, software libraries, and tools can be transparently ported between different machines.

Furthermore, the definition of a message passing standard provides vendors with a clearly de-

fined set of routines that they can implement efficiently, or in some cases provide hardware or

low-level system support for, thereby enhancing scalability.

The functionality thzt MPI is designed to provide is based on current common practice,

and is similar to that provided by widely-used message passing systems such as Express [12],

NX/2 [13], Vertex, [ll], PARMACS [8,9], and P4 [lo]. In addition, the flexibility and usefulness

of MPI has been broadened by incorporating ideas from more recent and innovative message

passing systems such as CHIMP [4,5], Zipcode [14,15], and the IBM External User lnterface

[7]. The general design philosophy followed by MPI is that while it would be imprudent to

include new and untested features in the standard, concepts that have been tested in a research

environment should be considered for inclusion. Many of the features in MPI related to process

groups and communication contexts have been investigated within research groups for several

years, but not in commercial or production environments. However, their incorporation into

MPI is justified by the expressive power they bring to the standard.

The MPI standardization effort involves about 60 people from 40 organizations mainly from

the United States and Europe. Most of the major vendors of concurrent computers are involved

in MPI, along with researchers from universities, government laboratories, and industry. The

standardization process began with the Workshop on Standards for Message Passing in a Dis-

tributed Memory Environment, sponsored by the Center for Research on Parallel Computing,

held April 29-30, 1992, in Williamsburg, Virginia [16]. At this workshop the basic features es-

sential to a standard message passing interface were discussed, and a working group established

to continue the standardization process.

A preliminary draft proposal, known as MPI1, was put forward by Dongarra, Hempel, Hey,

and Walker in November 1992, and a revised version was completed in February 1993 [3].

MPIl embodies the main features that were identified at the Williamsburg workshop as being

necessary in a message passing standard. This proposal was intended to initiate discussion of

standardization issues within the distributed memory concurrent computing community, and

has served as a basis for the subsequent MPI standardization process. Since MPI1 was primarily

intended to promote discussion and “get the ball rolling,” it focuses mainly on point-to-point

communications. MPIl does not include any collective communication routines. MPIl brought

- 2 -

to the forefront a number of important standardization issues, and has served as a catalyst for

subsequent progress, however, its major deficiency is that the management of resources is not

thread-safe. Although MPI1 and the MPI draft standard described in this paper have many

features in common, they are distinct proposals, with MPIl now being largely superseded by

the MPI draft standard.

In November 1992, a meeting of the MPI working group was held in Minneapolis, at which

it was decided to place the standardization process on a more formal footing, and to generally

adopt the procedures and organization of the High Performance Fortran forum. Subcommittees

were formed for the major component areas of the standard, and an email discussion service

established for each, In addition, the goal of producing a draft MPI standard by the Fall of

1993 was set. To achieve this goal the MPI working group has met every 6 weeks for two days

throughout the first 9 months of 1993, and it is intended to present the draft MPI standard at

the Supercomputing 93 conference in November 1993. These meetings and the email discussion

together constitute the MPI forum, membership of which has been open to all members of the

high performance computing community.

This paper i s being written at a time when MPI is still in the process of being defined, but

when the main features have been agreed upon. The only major exception concerns communica-

tion between processes in different groups. Some syntactical details, and the language bindings

for Fortran-77 and C, have not yet been considered in depth, and so will not be discussed here.

This paper is not intended to give a definitive, or even a complete, description of MPI. While

the main design features of MPI will be described, limitations on space prevent detaiied justifi-

cations for why these features were adopted. For these details the reader is referred to the MPI

specification document] and the archived email discussions, which are available electronically

as described in Section 4.

2. An Overview of MPI

MPI is intended to be a standard message passing interface for applications running on MIMD

distributed memory concurrent computers. We expect MPI also to be useful in building li-

braries of mathematical software for such machines. MPI is not specifically designed for use

by parallelizing compilers. MPI does not contain any support for fault tolerance] and assumes

reliable communications. MPI is a message passing interface] not a complete parallel computing

programming environment. Thus, issues such as parallel 1 /01 parallel program composition,

and debugging are not addressed by MPI. In addition, MPI does not provide explicit support

for active messages or virtual communication channels, although extensions for such features

are not precluded] and may be made in the future. Finally, MPI provides no explicit sup-

port for multithreading, although one of the design goals of MPI was to ensure that it can be

- 3 -

implemented efficiently in a multithreaded environment.

The MPI standard does not mandate that an implementation should be interoperable with

other MPI implementations. However, MPI does provide all the datatype information needed to

allow a single MPI implementation to operate in a heterogeneous environment.

A set of routines that support point-to-point communication between pairs of processes

forms the core of MPI. Routines for sending and receiving blocking and nonblocking messages

are provided. A blocking send does not return until it is safe for the application to alter the

message buffer on the sending process without corrupting or changing the message sent. A

nonblocking send may return while the message buffer on the sending process is still volatile,

and it should not be changed until it is guaranteed that this will not corrupt the message. This

may be done by either calling a routine that blocks until the message buffer may be safely

reused, or by calling a routine that performs a nonblocking check on the message status. A

blocking receive suspends execution on the receiving process until the incoming message has

been placed in the specified application buffer. A nonblocking receive may return before the

message has been received into the specified application buffer, and a subsequent call must be

made to ensure that this has occurred before the application uses the data in the message.

In MPI a message may be sent in one of three communication modes. The communication

mode specifies the conditions under which the sending of a message may be initiated, or when

i t completes. In ready mode a message may be sent only if a corresponding receive has been

initiated. In standard mode a message may be sent regardless of whether a corresponding

receive has been initiated. Finally, MPI includes a synchronous mode which is the same as the

standard mode, except that the send operation will not complete until a corresponding receive

has been initiated on the destination process.

There are, therefore, 6 types of send operation and 2 types of receive, as shown in Figure

1. In addition, routines are provided that send to one process while receiving from another.

Different versions are provided for when the send and receive buffers are distinct, and for when

they are the same. The send/receive operation is blocking, so does not return until the send

buffer is ready for reuse, and the incoming message has been received. The two send/receive

routines bring the total number of point-to-point message passing routines up to 10.

3. Details of MPI

In this section we discuss the MPI routines in more detail. Since the point-to-point and col-

lective communication routines depend heavily on the approach taken to groups and contexts,

and to a lesser extent on process topologies, we shall discuss groups, contexts, and topologies

first. These three related areas have generated much discussion within the MPI forum, and

a consensus has emerged only in the last few weeks. To some extent this difficulty in arriv-

- 4 -

SEND 1 Blocking Nonblocking

Standard j mpi-send

Ready mp i.xs end

Synchronous m p i s s end

mp i i s end

m p i i r s end

mpi i s s end

Figure 1: Classification and names of the point-to-point send and receive routines.

RECEIVE

Standard

ing at a consensus arises because different commonly-used message passing interfaces generally

handle groups, contexts, and topologies differently, and offer varying levels of suppoit. The

differing requirements in these three areas within the parallel computing community have also

contributed to the diversity of views.

Blocking Nonblocking

mp i -re c v mpi-irecv

3.1. Groups, Contexts, and Communicators

Although it is now agreed within the MPI forum that groups and contexts should be bound

together into abstract communicator objects, as described in Section 3.1.3, the precise details

have yet to be worked out, particularly in the case of communicators for communication between

groups. Thus, in this subsection we will give an overview of groups, contexts, and communica-

tors, without going into specific details that may subsequently change. In particular, we will

not discuss communication between processes in different groups as at the time of writing the

precise details are still under discussion.

3.1.1. Process Groups

The prevailing view within the MPI forum is that a process group is an ordered collection of

processes, and each process is uniquely identified by its rank within the ordering. For a group

of n processes the ranks run from 0 to n - 1. This definition of groups closely conforms to

current practice.

Process groups can be used in two important wa.ys. First, they can be used to specify

which processes are involved in a collective communication operation, such as a broadcast.

Second, they can be used to introduce task parallelism into an application, so that different

groups perform different tasks. If this is done by loading different executable codes into each

group, then we refer to this as MTMD task parallelism. Alternatively, if each group executes a

different conditional branch within the same executable code, then we refer to this as SPMD

task parallelism (also known as control parallelism). Although MPI does not provide mechanisms

for loading executable codes onto processors, nor for creating processes and assigning them to

- 5 -

processors, each process may execute its own distinct code. However, it is expected that many

initial MPI implementations will adopt a static process model, so that, as far as the application

is concerned, a fixed number of processes exist from program initiation to completion, each

running the same SPMD code.

Although the MPI process model is static, process groups are dynamic in the sense that they

can be created and destroyed, and each process can belong to several groups simultaneously.

However, the membership of a group cannot be changed asynchronously. For one or more pro-

cesses to join or leave a group, a new group must be created which requires the synchronization

of all processes in the group so formed. In MPI a group is an opaque object referenced by means

of a handle. MPI provides routines for creating new groups by listing the ranks (within a spec-

ified parent group) of the processes making up the new group, or by partitioning an existing

group using a key. The group partitioning routine is also passed an index, the size of which

determines the rank of the process in the new group. This also provides a way of permuting the

ranks within a group, if all processes in the group use the same value for the key, and set the

index equal to the desired new rank. Additional routines give the rank of the calling process

within a given group, test whether the calling process is in a given group, perform a barrier

synchronization with a group, and inquire about the size and membership of a group. Other

routines concerned with groups may be included in the final MPI draft.

3.1.2. Communication Contexts

Communication contexts, first used in the Zipcode communication system [14,15] , promote

software modularity by allowing the construction of independent communication streams be-

tween processes, thereby ensuring that messages sent in one phase of an application are not

incorrectly intercepted by another phase. Communication contexts are particularly important

in allowing libraries that make message passing calls to be used safely within an application.

The point here is that the application developer has no way of knowing if the tag, group, and

rank completely disambiguate the message traffic of different libraries and the rest of the appli-

cation. Context provides an additional criterion for message selection, and hence permits the

construction of independent tag spaces.

If communication contexts are not used there are two ways in which a call to a library

routine can lead to unintended behavior. In the first case the processes enter a library routine

synchronously when a send has been initiated for which the matching receive is not posted until

after the library call. In this case the message may be incorrectly received in the library routine.

The second possibility arises when different processes enter a library routine asynchronously,

as shown in the example in Figure 2, resulting in nondeterministic behavior. If the program

behaves correctly processes 0 and 1 each receive a message from process 2, using a wildcarded

- 6 -

Process 0

............................................ .:.>: ........................ g_r...(l)p .....................

...................... ......................... Ez!!%-- .........................

Process 1 Process 2 [recv(any)l - I send(1) I

Figure 2: Use of contexts. Time increases down the page. Numbers in parentheses indicate the process to which data are being sent or received. The gray shaded area represents the library routine call. In this case the program behaves as intended. Note that the second message sent by process 2 is received by process 0, and that the message sent by process 0 is received by process 2.

Process 0 Process 1 Process 2 /recv(any)I I send(1) 1

-

........... recv(0)

..-.-... 1.1

Figure 3: Unintended behavior of program. In this case the message from process 2 to process 0 is never received, and deadlock results.

selection criterion to indicate that they are prepared to receive a message from any process. The

three processes then pass data around in a ring within the library routine. If communication

contexts are not used this program may intermittently fail. Suppose we delay the sending of

the second message sent by process 2, for example, by inserting some computation, as shown

in Figure 3. In this case the wildcarded receive in process 0 is satisfied by a message sent

from process 1, rather than from process 2, and deadlock results. By supplying a different

communication context to the library routine we can ensure that the program is executed

correctly, regardless of when the processes enter the library routine.

- 7 -

3.1.3. Communicator Objects

The “scope” of a communication operation is specified by the communication context used,

and the group, or groups, involved. In a collective communication, or in a point-to-point

communication between members of the same group, only one group needs to be specified, and

the source and destination processes are given by their rank within this group. In a point-to-

point communication between processes in different groups, two groups must be specified to

define the scope. In this case the source and destination processes are given by their ranks

within their respective groups. In MPI abstract opaque objects called ”communicators” are

used to define the scope of a communication operation. In intragroup communication involving

members of the same group a communicator can be regarded as binding together a context and

a group. The creation of intergroup communicators for communicating between processes in

different groups is still under discussion within the MPI Forum, and so will not be discussed

here.

3.2. Application Topologies

In many applications the processes are arranged with a particular topology, such as a t w e

or three-dimensional grid. MPI provides support for general application topologies that are

specified by a graph in which processes that communicate a significant amount are connected

by an arc. If the application topology is an n-dimensional Cartesian grid then this generality

is not needed, so as a convenience MPI provides explicit support for such topologies. For a

Cartesian grid periodic or nonperiodic boundary conditions may apply in any specified grid

dimension. In MPI a group either has a Cartesian or graph topology, or no topology.

In MPI, application topologies are supported by an initialization routine, MPI-GRAPH or

MPI-CART, that specifies the topology of a given group, a function MPIJNQRANK that de-

termines the rank given a location in the topology associated with a group, and the inverse

function MPI-INQLOC that determines where a process is in the topology. In addition, the rou-

tine MPIJNQMAP returns the topology associated with a given group, and for a group with a

Cartesian topology, the routine MPIJNQCART gives the size and periodicity of the topology.

In addition to removing from the user the burden of having to write code to translate

between process identifier, as specified by group and rank, and location in the topology, MPI

also:

1. allows knowledge of the application topology to be exploited in order to efficiently assign

processes to physical processors,

2. provides a routine MPI-PARTC for partitioning a Cartesian grid into hyperplane groups

by removing a specified set of dimensions,

- 8 -

3. provides support for shifting data along a specified dimension of a Cartesian grid, and

By dividing a Cartesian grid into hyperplane groups it is possible to perform collective commu-

nication operations within these groups. In particular, if all but one dimension is removed a set

of one-dimensional subgroups is formed: and it is possible, for example, to perform a multicast

in the corresponding direction.

Support for shift operations is provided by a routine, MPISHIFTJD, that returns the ranks

of the processes that a process must send data to, and receive data from, when participating

in the shift. Once the source and destination process are known for each process, the shift

is performed by calling the routine M P I S E N D R E C V that allows each process to send to one

process while receiving from another. In a circular shift each process sends data to the process

whose location in the given dimension is obtained by adding a specified integer (which may be

negative) to its own location, modulo the number of processes in that dimension. In an end-off

shift each process determines the rank of its destination process by adding a specified integer

to its own rank, but if this exceeds the number of processes in the given dimension, or is less

than zero, then no data are sent. If the Cartesian grid is periodic in the dimension in which

the shift is done, then M P I S H I F T J D returns source and destination processes appropriate for

a circular shift. Otherwise M P I S H I F T J D returns source and destination processes appropriate

for an end-off shift.

3.3. Point-to-Point Communication

3.3.1. Message Selectivity

In MPI a process involved in a communication operation is identified by group and rank with

that group. Thus,

Process ID (group, rank)

In point-to-point communication, messages may be considered labeled by communication con-

text and message tag within that context. Thus,

Message ID E (context, tag)

When sending or receiving a message the process and message identifiers must be specified. The

group and context, which define the scope of the communication operation, are specified by

means of a communicator object in the argument list of the send and receive routines. The rank

and tag also appear in the argument list. A message sent in one scope can only be received

in a different scope, so the communicator objects specified by the send and receive routines

must match. The group and context components of a communicator may not be wildcarded.

Within a given scope, message selectivity is by rank and tag. Either, or both, of these may be

wildcarded by a receiving process to indicate that the corresponding selection criterion is to be

- 9 -

M P I S E N D ( I N start-of-buffer I N numbersf -items

I U destinationrank I N tag I N communicator)

IM datatype-of -items I

I

H P I R E C V ( OUT startaf -buf f er IW maxnumber-of-items I N datatype-of-items II sourceiank II tag IN communicator OUT returnstatus-ob ject)

Figure 4: Argument lists for the blocking send and receive routines

ignored. The argument lists for the block send and receive routines are shown in Figure 4. In Figure 4, the last argument to MPIRECV is a handle to a return status object. This object

may passed to an inquiry routine to determine the length of the message, or the actual source

rank and/or message tag if wildcards have been used. The argument lists for the nonblocking

send and receives are very similar, except that each returns a handle to an object that identifies

the communication operation. This object is used subsequently to check for completion of the

operation. In addition, the nonblocking receive does not return a return status object. Instead

the return status object is returned by the routine ihat confirms completion of the receive

operat ion.

3.3.2. General Datatypes

All point-to-point message passing routines in MPI take as an argument the datatype of the

data communicated. In the simplest case this will be a primitive datatype, such as an integer

or floating point number. Howeverl MPI also supports more general datatypes, and thereby

supports the communication of array sections and structures involvingcombinations of primitive

datatypes.

A general datatype is a sequence of pairs of primitive datatypes and integer byte displace-

ments. Thus,

- 1 0 -

Together with a base address, a datatype specifies a communication buffer. General datatypes

are built up hierarchically from simpler components. There are four basic constructors for

datatypes, namely the contiguous, vector, indexed, and structure constructors. We will now

discuss each of these in turn.

The contiguous constructor creates a new datatype from repetitions of a specified old

datatype. This requires us to specify the old datatype and the number of repetitions, n.

For example, if the old datatype is oldtype = { (double, 0), (char, 8) } and R = 3, then the

new datatype would be,

{ (double, 0), (char, 8), (double, IS), (char, 24), (double, 32), (char, 40) }

It should be noted how each repeated unit in the new datatype is aligned with a double word

boundary. This alignment is dictated by the appearance of a double in the old datatype, so

that the extent of the old datatype is taken as 16 bytes, rather than 9 bytes.

The vector constructor builds a new datatype by replicating an old datatype in blocks at

fixed offsets. The new datatype consists of count blocks, each of which is a repetition of

blocklen items of some specified old datatype. The starts of successive blocks are offset by

stride items of the old datatype. Thus, if count = 2, blocklen = 3, and stride = 4 then

the new datatype would be,

{ (double, 0), (char, 8), (double, 16), (char, 24), (double, 32), (char, 40),

(double, 64), (char, 72), (double, SO), (char, 88), (double, 96), (char, 104)}

Here the offset between the two blocks is 64 bytes, which is the stride multiplied by the extent

of the old datatype.

The indexed constructor i s a generalization of the vector constructor in which each block has

a different size and offset. The sizes and offsets are given by the entries in two integer arrays,

B and I. The new datatype consists of count blocks, and the ith block is of length B Cil items

of the specified old datatype. The offset of the start of the ith block is ICil items of the old

datatype. Thus, if count = 2, B = (3, l}, and I = {64,0}, then the new datatype would be,

{ (double, 64), (char, 72), (double, 80), (char, 88), (double, 96), (char, 104),

(double, O ) , (char, 8) }

The structure constructor is the most general of the datatype constructors. This constructor

generalizes the indexed constructor by allowing each block to be of a different datatype. Thus,

- 11 -

left right edge edge

Figure 5: Particle migration in a one-dimensional code. The left and right edges of a process domain are shown. We shall consider just the migration of particles across the righthand boundary.

in addition to specifying the number of blocks, count, and the block length and offset arrays, B

and I, we must also give the datatype of the replicated unit in each block. Let us assume this is

specified in an array T. The length of the ith block is B Cil items of type T Cil , and the offset of

the start of the ith block is ICil bytes. Thus, if count=3, T = {HPI-FLOAT, oldtype, MPI-CHAR},

I = {0,16,26}, and B = {2,1,3}, then the new datatype would be,

In addition to the constructors described above, there is a variant of the vector constructor

in which the stride is given in bytes instead of the number of items. There is also a variant of

the indexed constructor in which the block offsets are given in bytes.

To better understand the use of general data structures consider the example of an appli-

cation in which particles move on a one-dimensional domain. We assume that each process is

responsible for a different section of this domain. In each time step particles may move from

the subdomain of one process to that of another, and so the data for such particles must be

communicated between processes. We shall just consider here the task of migrating particles

across the righthand boundary of a process, as shown in Figure 5. The particle data are stored

in an array of structures, with each entry in this structure consisting of the particle position,

x, velocity, v, and type, k:

stract Pstrnct { double x; double v; int k; };

The C code for migrating particles across the righthand boundary is shown in Figure 6.

In Figure 6 the code in the first box creates a datatype, Ptype, that represents the Pstruct

structure for a single particle. This datatype is,

Ptype = ((double,O), (double,8), (int,l6)}

In the second code box the particles that have crossed the righthand boundary are identified,

- 1 2 -

struct Pstruct particleClOOO1: WLdatatype Ptype, Ztype; MPLdatatype Stype [31={MPI_double, MPI-double, HPI-int} : int Sblock[3]=(1, 1, 1 ) ; i n t Sindex [3] ; int PindexClOOl : int PblockClOO] :

~

SindexCO] = 0; Sindex[l] = sizeof (double) ; SindexC21 = 2*sizeof (double) ; €PI-typestruct (3 , Stype. Sindex, Sblock, &Ptype):

j = O ; for (i=O;i<lOOO;i++)

i f (xci] > right-edge) { PindexCjl = i ; PblockCjl = 1; j++; }

MPI-type-indexed (j , Ptype, Pindex, Pblock, BZtype) ;

WI-type-commit (Ztype) : MPIsend (particle 1 , Ztype, dest , tag, coma) ;

Figure 6 : Fragment of C code for migrating particles across the righthand process boundary

and their index in the p a r t i c l e array is stored in Pindex. It is assumed that no more than

100 particles cross the boundary. The call to MPI-typeindexed uses an indexed constructor

t o create a new datatype, Ztype, that references all the migrating particles. Before sending the

data, the Ztype datatype must be committed. This is done to allow the system to use a different

internal representation for Ztype, and to optimize the communication operation. Committing

a datatype is most likely to be advantageous when reusing a datatype many times, which is not

the case in this example. Finally, the migrating particles are sent to their destination process,

d e s t , by a call to HPI-send. The offsets in the Ztype datatype are interpreted relative to the

address of the start of the p a r t i c l e array.

3.3.3. Communication Completion

Following a call t o a nonblocking send or receive routine there are a number of ways in which the

handle returned by the call can be used to check the completion status of the communication

operation, or to suspend further execution until the operation is complete. MPILWAIT does not

return until the communication operation referred to by the input handle is complete. MPLTEST

does not wait until the operation identified by the input handle is complete, but instead returns

a logical variable that is TRUE if the operation is complete, and FALSE otherwise. If the input

handle refers to a receive operation, then MPI-WAIT and MPI-TEST both return a handle to a

- 13 -

return status object. This handle can subsequently be passed to a query routine to determine

the actual source, tag, and length of the message received.

An additional two routines exist for waiting for the completion of any or all of the handles

in a list of handles. Similarly] there are variants of the test routine that check if all, or at least

one, of the communication operations identified by a list of handles is complete.

3.3.4. Persistent Communication Objects

MPI also provides a set of routines for creating communication objects that completely describe

a send or receive operation by binding together all the parameters of the operation. A handle

to the communication object so formed is returned, and may subsequently be passed to the

routine MPISTART to actually initiate the communication. The MPI-WAIT routine, or a similar

completion routine, must be called to ensure completion of the operation, as discussed in Section

3.3.3.

Persistent communication objects may be used to optimize communication performance,

particularly when the same communication pattern is repeated many times in an application.

For example, if a send routine is called within a loop, performance may be improved by creating

a communication object that describes the parameters of the send prior to entering the loop,

and then calling MPISTART inside the loop to send the data on each pass through the loop.

There are four routines for creating communication objects: three for send operations,

corresponding to the standard, ready, and synchronous modes, and one for receive operations.

A persistent communication object must be deallocated when no longer needed.

3.4. Collective Communication

Collective communication routines provide for coordinated communication among a group of

processes [1,2]. The process group is given by the communicator object that is input to the

routine. The MPI collective communication routines have been designed so that their syntax

and semantics are consistent with those of the point-to-point routines. The collective com-

munication routines may, but do not have to be, implemented using the MPI point-to-point

routines. Collective communication routines do not have message tag arguments, though an

implementation in terms of the point-tepoint routines may need to make use of tags. A col-

lective communication routine must be called by all members of the group with consistent

arguments. As soon as a process has completed its role in the collective communication it

may continue with other tasks. Thus, a collective communication is not necessarily a barrier

synchronization for the group. MPI does not include iionblocking forms of the collective com-

munication routines. MPI collective communication routines are divided into two broad classes:

data movement routines, and global computation routines.

- 1 4 -

3.4.1. Collective Data Movement Routines

There are 3 basic types of collective data movement routine: broadcast, scatter, and gather.

There are two versions of each of these three routines: in the one-all case data are communicated

between one process and all others; in the all-all case data are communicated between each

process and all others. Figure 7 shows the one-all and all-all versions of the broadcast, scatter,

and gather routines for a group of six processors.

The all-all broadcast, and both varieties of the scatter and gather routines, involve each

process sending distinct data to each process, and/or receiving distinct data from each process.

In these routines each process may send to and/or receive from each other process a different

number of data items, but the send and receive datatypes must be consistent. To illustrate this

point consider the following example in which process 0 gathers data from processes 1 and 2.

Suppose the receive datatype in process 0, and the send datatypes in processes 1 and 2 are as

follows,

In process 0: recvtype={(int,O), (float,4)}

In process 1: sendtype={(int,o), (float,4), (int,96), (float, loo), (ist,32), (floaL,36)}

In process 2: sendtype={(int, 16), (float,20), (int,48), (float,52)}

Such a situation could arise in a C program in which an indexed datatype constructor has been

applied to an array of structures, each element of which consists of an integer and a floating-

point number. Although the datatypes are different in each process, they are t y p e conszstent,

since each consists of repetitions of an integer followed by a float.

The one-all broadcast routine broadcasts data from one process to all other processes in the

group. The all-all broadcast broadcasts data from each process to all others, and on completion

each has received the same data. Thus, for the all-all broadcast each process ends up with the

same output data, which is the concatenation of the input data of all processes, in rank order.

The one-all scatter routine sends distinct data from one process to all processes in the group.

This is also known as “one-to-all personalized communication”. In the all-all scatter routine

each process scatters distinct data to all processes in the group, so the processes receive different

data from each process. This is also known as “all-to-all personalized communication”.

The communication patterns in the gather routines are the same as in the scatter routines,

except that the direction of flow of data is reversed. In the one-all gather routine one process

(the root) receives data from every process in the group, In the root process receives the

concatenation of the input buffers of all processes, in rank order. There is no separate all-all

gather routine since this would just be identical to the all-all scatter routine, so there are 5

basic data movement routines.

- 1 5 -

data - one-all broadcast a

all-all broadcast

one-all scatter

I

c-=r, one-all gather

all-all scatter e Figure 7: One-all and all-all versions of the broadcast, scatter, and gather routines for a group of six processes. In each case, each row of boxes represents data locations in one process. Thus, in the one-all broadcast, initially just the first process contains the data .40, but after the broadcast all processes contain it.

- 1 6 -

In addition, MPI provides versions of all these 5 routines, except the one-all broadcast, in

which the send and receive datatypes are type consistent as discussed above, but in which each

process is allocated a f ized size portion of the communication buffer. These bring the total

number of data movement routines to 9.

3.4.2. Global Computation Routines

There are two basic global computation routines in MFI: reduce and scan. The reduce and scan

routines both require the specification of an input function. One version is provided in which

the user selects the function from a predefined list; in the second version the user supplies (a

pointer to) a function that is associative and commutative; in the third version the user supplies

(a pointer to) a function that is associative, but not necessarily commutative. In addition, there

are three variants of the reduction routines. In one variant the reduced results are returned to

a single specified process; in the second variant the reduced results are returned to all processes

involved; and, in the third variant the reduced results are scattered across the processes involved.

This latter variant is a generalization of the f o l d routine described in Chapter 21 of [6]. Thus,

there are 12 global computation routines, and a total of 21 collective communication routines

(or 22 if we include the routine for performing a barrier synchronization over a process group).

The reduce routines combine the values provided in the input buffer of each process using

a specified function. Thus, if Di is the data in the process with rank i in the group, and @I is

the combining function, then the following quantity is evaluated,

where n is the size of the group. Common reduction operations are the evaluation of the

maximum, minimum, or sum of a set of values distributed across a group of processes.

The scan routines perform a parallel prefix with respect to an associative reduction operation

on data distributed across a specified group. On completion the output buffer of the process

with rank i contains the result of combining the values from the processes with rank 0 , 1 , . . . , i,

It should be noted that segmented scans can be performed by first creating distinct sub-

groups for each segment.

4. Summary

This paper has given an overview of the main features of M P I , but has not described the

detailed syntax of the MPI routines, or discussed language binding issues. These will be fully

- 1 7 -

discussed in the MPI specification document, a draft of which is expected to be available by the

Supercomputing 93 conference in November 1993.

The design of MPI has been a cooperative effort involving about 60 people. Much of the

discussion has been by electronic mail, and has been archived, along with copies of the MPI

draft and other key documents. Copies of the archives and documents may be obtained by

netlib. For details of what is available, and how to get it, please send the message %end index

from mpi” to netlibQom1. gov.

Acknowledgments

Many people have contributed to MPI , so it is not possible to acknowledge them all individ-

ually. However, many of the ideas presented in this paper are due to the MPI subcommittee

chairs: Scott Berryman, James Cownie, Jack Dongarra, A1 Geist, William Gropp, Rolf Hempel,

Steve Huss-Lederman, Anthony Skjellum, Marc Snir, and Steven Zenith. Lyndon Clarke, Bob

Knighten, Rik Littlefield, and Rusty Lusk have also made important contributions, as has also

Steve Otto, the editor of the MPI specification document.

5. References

[l] V. Bala, J . Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and Marc Snir. Ccl:

A portable and tunable collective communication library for scalable parallel computers.

Technical report, IBM T. J. Watson Research Center, 1993. Preprint.

[2] J . Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and Marc Snir. Ccl:

A portable and tunable collective communication library for scalable parallel computers.

Technical report, IBM Almaden Research Center. 1993. Preprint.

[3] J . J . Dongarra, R. Hempel, A. J . G. Hey, and D. W. Walker. A proposal for a user-

level, message passing interface in a distributed memory environment. Technical Report

TM-12231, Oak Ridge National Laboratory, February 1993.

[4] Edinburgh Parallel Computing Centre, Tlniversity of Edinburgh. CHIMP Concepts, June

1991.

[5] Edinburgh Parallel Computing Centre, University of Edinburgh. CHIMP Version 1.0 Interface, May 1992.

[6] G. C. Fox, M. A. Johnson, G. A. Lyzenga, S . W. Otto, J . K. Salmon, and I). W. Walker.

Solving Problems on Concurrent Processors, volume 1. Prentice Hall, Englewood Cliffs,

N.J., 1988.

- 1 8 -

[7] D. Frye, R. Bryant, H . Ho, R. Lawrence, and M. Snir. An external user interface for

scalable parallel systems. Technical report, IRM, May 1992.

[8] R. Hempel. The ANLIGMD macros (PARMACS) in fortran for portable parallel program-

ming using the message passing programming model - users’ guide and reference manual.

Technical report, GMD, Postfach 1316, D-5205 Sankt Augustin 1, Germany, November

1991.

[9] R. Hempel, H.-C. Hoppe, and A. Supalov. A proposal for a PARMACS library interface.

Technical report, GMD, Postfach 1316, D-5205 Sankt Augustin 1, Germany, October 1992.

[lo] Ewing Lusk, Ross Overbeek, et al. Portable Programs for Parallel Processors. Holt,

Rinehart and Winston, Inc., 1987.

[ll] nCUBE Corporation. nCUBE 2 Programmers Guide, r2.0, December 1990

[12] Parasoft Corporation. Express Verszon 1.0: A Communicatzon Envzronmenl for Parallel

Computers, 1988.

[13] Paul Pierce. The NX/2 operating system. In Yroceedzngs of the Thzrd Conference on

Hypercube Concurrend Computers and Applicattons, pages 384-390. ACM Press, 1988.

[14] A. Skjellum and A. Leung. Zipcode: a portable multicomputer communication library

atop the reactive kernel. In D. W. Walker and Q. F. Stout, editors, Proceedings of the

Fiflh Distributed Memory Concurrent Computing Conference, pages 767-776. IEEE Press,

1990.

[15] A. Skjellum, S. Smith, C. Still, A. Leung, and M. Morari. The Zipcode message passing

system. Technical report, Lawrence Livermore National Laboratory, September 1992.

[16] D. Walker. Standards for message passing in a distributed memory environment. ‘rechnical

Report TM-12147. Oak Ridge National Laboratory, August 1992.

- 19 -

ORNL/TM-12512

INTERNAL DISTRIBUTION

1. B. R. Appleton 2. J . Choi

3 4 . T. S. Darland 5. E. F. D’Azevedo 6. J . J . Dongarra 7. G. A. Geist 8. L. J . Gray 9. M. R. Leuze

10. E. G .Ng 11. C. E. Oliver 12. B. W. Peyton

18. C. H. Romine 13-17. S. A. Raby

19. T. H. Rowan 20-24. R. F. Sincovec 25-29. D. W. Walker 30-34. R. C. Ward

35. P. H. Worley 36. Central Research Library 37. ORNL Patent Office 38. K-25 Applied Technology Li-

39. Y-12 Technical Library 40. Laboratory Records - RC

brary

41-42. Laboratory Records Department

EXTERNAL DISTRIBUTION

43. Thomas A. Adams, NCC and OSC / NRaD, Code 733, 271 Catalina Blvd., San Diego, CA 92152-5000

44. Robert J . Allen, Daresbury Laboratory, S.E.R.C., Daresbury, Warrington WA4 4AD, United Kingdom

45. Giovanni Aloisio, Dipt. di Elettrotecnica ed Elettronica, Universita di Bari, Via Re David 200, 70125 Bari, Italy

46. Ed Anderson, Mathematical Software Group, Cray Research Incorporated, 655F Lone Oak Drive, Eagan, MN 55121

47. Mark Anderson, Rice University, Department of Computer Science, P. 0. Box 1892, Houston, TX 77251

48. Ian G . Angus, Boeing Computer Services, M/S 7L-22, P. 0. Box 24346, Seattle, WA 98124-0346

49. Marco Annaratone, Digital Equipment Corporation, 14G Main Street MLO1- 5/U46, Maynard, MA 01754

50. Vasanth Bala, IBM T. J . Watson Research Center, P. 0. Box 218, Yorktown Heights, NY 10598

51. Donald M. Austin, 6196 EECS Bldg ., University of Minnesota, 200 Union Street, S.E., Minneapolis, M N 55455

52. Joseph G. Baron, IBM Corporation, AWS Advanced Product Development, 11400 Burnet Road, Austin, TX 78758-3493

53. Edward H. Barsis, Computer Science and Mathematics, P. 0. Box 5800, Sandia National Laboratories, Albuquerque, NM 87185

- 20 -

54. Eric Barszcz, Mail Stop T-045, NASA Ames Research Center, Moffet Field, CA 94035

55. Eric Barton, Meiko Limited, 650 Aztec West, Bristol BS12 4SD, United Kingdom

56. A. Basu C-DAC 2/1 Brunton Road Bangalore 560 025 India

57. Adam Beguelin, Carnegie Mellon University, School of Computer Science, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890

58. Siegfried Benker, Institute for Statistics and Computer Science, University of Vi- enna, A-1210 Vienna, Austria

59. Ed Benson, Digital Equipment Gorp., 146 Main Street, ML01-5/U46, Maynard, MA 01754

60. Roger Berry, NCUBE Corporation, 4313 Prince Road, Rockville, MD 20853

61. Scott Berryman, Yale University, Computer Science Department, 51 Prospect Street, New Haven, CT 06520

62. Biondo Biondi, Stanford University, Department of Geophysics, Stanford, CA 94305

63. Robert Bjornson, Department of Computer Science, Box 2158 Yale Station, New Haven, CT 06520

64. Peter Brezany, Institute for Statistics and Computer Science, University of Vienna, A-1210 Vienna, Austria

65. Roger W. Brockett, Wang Professor of EE and CS, Division of Applied Sciences, Harvard University, Cambridge, MA 02138

66. Eric Browne, University of Cambridge, Department of Earth Sciences, Downing Street, Cambridge CB2 3EQ, United Kingdom

67. Clemens H. Cap, University of Zurich, Department of Computer Science, Win- terthurerstr. 190, CH-8057 Zurich, Switzerland

68. Trevor Carden, Parsys Ltd., Boundary House, Boston Road, London W7 2QE, United Kingdom

69. Siddhartha Chatterjee, RIACS, Mail Stop T045-1, NASA Ames Research Center, Moffett Field, CA 94035-1000

70. Hsing-bung Chen, University of Texas-Arlington, CSE Department, Box 19015, Arlington, TX 76019

71. Doreen Y. Cheng, Computer Science Corporation, NASA Ames Research Center, Mail Stop 258-6, Moffett Field, CA 94035

72. Kuo-Ning Chiang, National Center for High-Performance Computing, P.O. Box 19-136, Hsinchu, Taiwan R.0.C

73. Lyndon Clarke, Edinburgh Parallel Computing Centre, James Clerk Maxwell Building, The King’s Buildings, Mayfield Road, Edinburgh EM9 3JZ, United King- dom

- 21 -

74. Robert Cohen, Department of Computer Science, Australian National University, GPO Box 4, Canberra 2601, Australia

75. Michele Colajanni, Dip. di Ingegneria Elettronica, Universita’ di Roma “Tor Ver- gata” Via della Ricerca Scientifica, 00133 - Roma, Italy

76. Jeremy Cook, Parallel Processing Laboratory, Dept. of Informatics, University of Bergen, High Technology Centre, N-5020 Bergen, Norway

77. Manuel Eduardo C. D. Correia, Centro de Informatica, Universidade do Porto (CIUP), Rua do Campo Alegre 823, 4100 Porto, Portugal

78. Jim Cownie, Meiko Limited, 650 Aztec West, Bristol BS12 4SD, United Kingdom

79. Michel Dayde, CERFACS, 42 Avenue G. Coriolis, 31057 Toulouse Cedex, France

80. David DiNucci, Computer Sciences Corporation, NASA Ames Research Center, M/S 258-6, Moffet Field, CA 94035

81. Mark Debbage, University of Southampton, Dept. of Electronics and Computer Science, Highfield, Southampton SO9 5WH, United Kingdom

82. Dominique Duval, Telmat Informatique, BP 12, Rue de l’Industrie, 68360 Soultz, France

83. Tom Eidson, Theoretical Flow Physics Branch, MIS 156, NASA Langley Research Center, Hampton, VA 23665

84. Victor Eijkhout, University of Tennessee, 107 Ayres Hall, Department of Com- puter Science, Knoxville, T N 37996-1301

85. Anne Elster, Cornell University, Xerox DRI, 502 Engineering and Theory Center, Ithaca, NY 14853

86. Rob Falgout, Lawrence Livermore National Lab, L-419, P. 0. Box 808, Livermore, CA 94551

87. Jim Feeney, IBM Endicott, R. D. 3, Box 224, Endicott, NY 13760

88. Edward Felten, Department of Computer Science, University of Washington, Seat- tle, WA 98195

89. Vince Fernando, NAG Limited, Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom

90. Sam Fineberg, NASA Ames Research Center, M/S 258-6, Moffett Field, CA 94035- 1000

91. Randy Fischer, 615 NW 32st Place, Gainesville, FL 32607

92. Jon Flower, Parasoft Corporation, 2500 E. Foothill Blvd., Suite 205, Pasadena, CA91107

93. David Forslund, Los Alamos National Laboratory, Advanced Computing Labora- tory, MS B287, Los Alamos, NM 87545

94. Geoffrey C. Fox, Syracuse University, Northeast Parallel Architectures Center, 111 College Place, Syracuse, NY 13244-4100

- 22 -

95. Lars Frellesen, Math-Tech Aps, Kildeskovsvej 67, 2820 Gentofte, DK - Denmark

96. Josef Fritscher, Computing Center, Technical University of Vienna, Wiedner Haupt- strasse 8-10, A-1040 Vienna, Austria

97. Daniel D. Frye, IBM Corporation, Dept. 49NA / MS 614, Neighborhood Road, Kingston, NY 12401

98. Kyle Gallivan, University of Illinois, CSFLD, 465 CSRL, 1308 West Main Street, Urbana, IL 61801-2307

99. J . Alan George, Vice President, Academic and Provost, Needles Hall, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1

100. Mike Gerndt, Zentralinstitut fuer Angewandte Mathematik, Forschungszentrum Juelich GmbK, Postfach 1913, D-5170 Juelich, Germany

101. Ian Glendinning, University of Southampton, Dept. of Electronics and Comp. Sci., Southampton SO9 5NH, United Kingdom

102. Gene H. Golub, Department of Computer Science, Stanford University, Stanford, CA 94305

103. Adam Greenberg, Thinking Machines Corporation, 245 First Street, Cambridge, MA 02142-1214

104. Robert Greimel, AVL List Gmbh., Department 'TSS, Kleiststrasse 48, A-8020 Graz, Austria,

105. William Gropp, Argonne National Laboratory, Mathematics and Computer Sci- ence, 9700 South Cass Avenue, MCS 221, Argonne, IL 60439-4844

106. Sanjay Gupta, ICASE, Mail Stop 132C, NASA Langley Research Center, Hamp- ton, UA 23665-5225

107. John Gustafson, 236 Wilhelm, Ames Laboratory, Iowa State University, Ames, IA 50011

108. Fred Gustavsoii, IBM T. J. Watson Research Center, Room 33-260, P. 0. Box 218, Yorktown Heights, NY 10598

109. Robert Halstead. Digital Equipment Corporation, Cambridge Research Lab., One Kendall Sq. Bldg. 700, Cambridge, MA 02139

110. Robert J . Harrison, Battelle Pacific Northwest Laboratory, Mail Stop K1-SO, P. 0. Box 999, Richland, WA 99352

111. Leslie Hart, NOAA/FSL, R/E/FS5, 325 Broadway, Boulder, CO 80303

112. Tom Haupt, Syracuse University, Northeast Parallel Architectures Center, 11 1 College Place, Syracuse, NY 13244-4100

113. Michael Heath, University of Illinois, NCSA, 4157 Beckman Institute, 405 North Mathews Avenue, Urbana, IL 61801-2300

114. Don Heller, Center for Research 011 Parallel Computation, Rice University, P.O. Box 1892, Houston, TX 77251

- 23 -

115. Rolf Hempel, GMD, Schloss Birlinghoveii, Postfach 13 16, D-W-5205 Sankt Au- gustin 1, Germany

116. Tom Henderson, NOAA/FSL, R/E/FS5, 325 Broadway, Boulder, CO 80303

117. Anthony J . G . Hey, University of Southampton, Dept. of Electronics and Comp. Sci., Southampton SO9 5NH, United Kingdom

118. Mark Hill, University of Southampton, Dept. of Electronics and Comp. Sci., Southampton SO9 5NH, United Kingdom

119. C. T. Howard Ho, IBM Almaden Research Center, K54/802,650 Harry Road, San Jose, CA 95120

120. Randy Holmes, IBM Corporation, High Performance Computing Services, 1507 LBJ Freeway, Dal!as, TX 75234

121. Gary W. Howell, Florida Institute of Technology, Department of Applied Mathe- matics, 150 W. Univeristy Blvd., Melbourne, FL 32901

122. Chengchang Huang, 2814 Beau Jardin, Apt. 301, Lansing, MI 48910

123. Steve Huss-Lederman, Supercomputing Research Center, 17100 Science Drive, Bowie, MD 20715-4300

124. Joefon Jann, IBM T.J. Watson Research Center, P. 0. Box 218, Yorktown Heights, NY 10598

125. S. Lennart Johnsson, Thinking Machines Corporation, 245 First Street, Cam- bridge, MA 02142-1214

126. Charles Jung, IBM Kingston, 67LB/MS 614, Neighborhood Road, Kingston, NY 12401

127. Edgar T. Kdns, Michigan State University, Advanced Computing Systems Lab, Department of Computer Science, East Lansing, MI 48824

128. Malvyn H. Kalos, Cornell Theory Center, Engineering and Theory Center Bldg., Cornell University, Ithaca, NY 14853-3901

129. John Kapenga, Department of Compute1 Science, Western Michigan Universuty, Kalamazoo, MI 49008

130. Hans Kaper, Mathematics and Computer Science Division, Argonne National Lab- oratory, Bldg. 221, 9700 South Cass Avenue, Argonne, IL 60439

131. Udo Keller, PALLAS GmbH, Hermuelheimer Strasse 10, D-W5040 Bruehl, Ger- many

132. Ken Kennedy, Rice University, Department of Computer Science, P. 0. Box 1892, Houston, T X 77251

133. Ronan Keryell, Ecole Nationale Superieure des Mines de Paris, Centre de Recherche en Informatique, 35, Rue Saint-Honore, 77305 Fontainebleau Cedex, France

134. Shlomo Kipnis, JBM T. J . Watson Research Center, PO Box 218, Yorktown Heights, NY 10598

- 24 -

135. Robert L. Knighten, Intel Corporation, Supercomputer Systems Division, 15201 NW Greenbrier Parkway, Beaverton, OR 97006

136. Charles Koelbel, Rice University, CITI/CRPC, P. 0. Box 1892, Houston, T X 77251

137. Edward Kushner, Intel Corporation, 15201 NW Greenbrier Parkway, Beaverton, OR 97006

138. Pierre Lagier, 24, Avenue de l’Europe, 78141 Velizy Villacoublay, France

139. Derryck Lamptey, National Transputer Support Centre, University of Sheffield, Sheffield, United Kingdom

140. Falk Langhammer, Parsytec Computer GmbH, Juelicher Strasse 338, D-5100 Aachen, Germany

141. Randolph Langley, Florida State University, 400 SCL, B-186, Tallahassee, FL 32306

142. Bob Leary, San Diego Supercomputer Center, P. 0. Box 85608, San Diego, CA 92186-9784

143. Bruce Leasure, Kuck and Associates, Inc., 1906 Fox Drive, Champaign, IL 61820

144. James E. Leiss, Rt. 2, Box 142C, Broadway, VA 22815

145. Eric Leu, TBM Almaden Research Center, 650 Harry Road K54/802, San Jose, CA 95123

146. David Levine, Argonne National Laboratory, MCS 221 C-216, Argonne, IL 60439

147. John Lewis, Boeing Computer Services , Mail Stop 7L-21 P. 8. Box 24346, Seattle, WA 98124-0346

148. David Linden, Digital Equipment Corp., 146 Main Street, ML01-5/U46, Maynard, MA 01754

149. Rik Littlefield, Battelle Pacific Northwest Laboratory, Mail Stop K1-87, P. 0. Box 999, Richland, WA 99352

150. Mircon Livny, University of Wisconsin, Department of Computer Science, 1210 West Dayton Street, Madison, WI 53706

151. Rusty Lusk, Argonne National Laboratory, Mathematics and Computer Science, 9700 South Cass Avenue, MCS 221, Argonne, IL 60439-4844

152. Arthur B. Maccabe, Sandia National Labs, Dept. 1424, Albuquerque, NM 87185- 5800

153. Neil MacDonald, Edinburgh Parallel Computing Centre, James Clerk Maxwell Building, The King’s Buildings, Mayfield Road, Edinburgh EHS 352, United King- dom

154. Peter Madams, nCUBE Corporation, 919 East Hillsdale Blvd., Foster City, CA 94404

155. Amitava Majumdar, University of Michigan, Department of Nuclear Engineering, Ann Arbor, MI 48109

- 25 -

156. David P. Mallon, Leeds University, School of Computer Studies, Leeds LS2 9JT, United Kingdom

157. Dan Cristian Marinescu, Computer Sciences Department, Purdue University, West Lafayette, IN 47907

158. Tim Mattson, Scientific Computing Associates, Inc., 265 Church Street, New Haven, CT 06510-7010

159. Oliver McBryan, University of Colorado at Boulder, Department of Computer Science, Campus Box 425, Boulder, CO 80309-0425

160. Robert McLay, University of Texas a t Austin, Dept ASEEM 60600, Austin, T X 78712

161. James McGraw, Lawrence Livermore National Laboratory, L-306, P. 0. Box 808, Livermore, CA 94550

162. Phil McKinley, A714 Wells Hall, Michigan State University, East Lansing, MI 48824

163. Piyush Rlehrotra, ICASE, Mail Stop 132C, NASA Langley Research Center, Hamp- ton, VA 23665

164. Paul Messina, California Institute of Technology, Mail Stop 158-79, 1201 E. Cali- fornia Boulevard, Pasadena, CA 91125

165. Moataz Moharned, University of Oregon, Department of Computer Science, Eu- gene, OR 97403

166. Neville Moray, Department of Mechanical and Industrial Engineering, University of Illinois, 1206 West Green Street, Urbana, IL 61801

167. Charles Mosher, ARC0 Exploration and Production Technology. 2300 West Plano Parkway, Plano, TX 75075-8499

168. Harish Nag, Intel Corporation, M/S C04-02, 5200 Elam Young Parkway, Hills- boro, OR 97124

169. Jonathan Nash, Leeds University, School of Computer Studies, Leeds LS2 9JT, United Kingdom

170. Dan Nessett, Lawrence Livermore National Laboratory, L-60, Livermore, CA 94550

171. Lionel M. Ni, Michigan State University, Dept. of Computer Science, A714 Wells Hall, East Lansing, MI 48824-1027

172. Mike Norman, Edinburgh Parallel Computing Centre, James Clerk Maxwell Build- ing, The King’s Buildings, Mayfield Road, Edinburgh EH9 352, United Kingdom

173. James M. Ortega, Department of Applied Mathematics, Thornton Hall, University of Virginia, Charlottesville, VA 22901

174. Steve Otto, Oregon Graduate Institute, Department of Computer Sci. and Eng., 19600 NW von Neumann Drive, Beaverton, OR 97006-1999

175. Andrea Overman, NASA Langley Research Center, MS 125, Hampton, VA 23665

- 26 -

176. Peter S. Pacheco, University of San Francisco, Department of Mathematics, San Francisco, CA 94117

177. Cherri M. Pancake, Department of Computer Science, Oregon State University, Corvallis, OR 97331-3202

178. Raj Panda, IBM Corporation, Mail Code E39/4305, 11400 Burnet Rd. , Austin, TX 78758

179. David Payne, Intel Corporation, Supercomputer Systems Division, 15201 NW Greenbrier Parkway, Beaverton, OR 97006

180. Arnulfo Perez, Centro de Intelligencia Artifical, ITESM, SUC. De Correos “J” C.P. 64849, Monterrey N.L., Mexico

181. K. S. Perimayagam, Centre for Development of Advanced Computing, Pune Uni- versity Campus, Pune 411 007, India

182. Matthew Peters, Parallel and Distributed Processing, IBM UK Scientific Centre, Winchester, United Kingdom

183. Garry Petrie, Intel MS CO5-01, 5200 NE Elam Young Parkway, Hillshoro, OR 971246497

184. Greg Pfister, IBM Corporation, Mail Stop 9462, 11400 Burnet Road, Austin, TX 78758-3493

185. Jean-Laurent Philippe, ARCHIPEL S.A., PAE des Glaisins, 1 rue du Bulloz, F- 74940 Annecy-le-View, France

186. Paul Pierce, Intel Corporation, Supercomputer Systems Division, 15201 NW Green- brier Parkway, Beaverton, OR 97006

187. Robert J . Plemmons, Departments of Mathematics and Computer Science, Box 7311, Wake Forest University Winston-Salem, NC 27109

188. James C. T. Pool, Deputy Director, Caltech Concurrent Supercomputing Facility, MS 158-79, California Institute of Technology, Pasadena, CA 91125

189. Steve Poole, 11631 Lima, Houston, TX 77099

190. Roldan Pozo, University of Tennessee, 107 Ayres Hall, Department of Computer Science, Knoxville, TN 37996-1301

191. Angela Quealy, Sverdrup Technology, Inc., NASA Lewis Research Center Group, 2001 Aerospace Pkwy, Brook Park, OH 44142

192. Padma Raghavan, University of Illinois, NCSA, 4151 Beckman Institute, 405 North Matthews Avenue, Urbana, IL 61801

193. Sanjay Ranka, Syracuse University, Northeast Parallel Architectures Center, 11 1 College Place, Syracuse, NY 132444100

194. Robbert van Renesse, Dept. of Computer Science, 4118 Upson Hall, Cornell Uni- versity, Ithaca, NY 14853

195. Werner C. Rheinboldt, Department of Mathematics and Statistics, University of Pittsburgh, Pittsburgh, PA 15260

- 27 -

196. Peter Rigsbee, Cray Research Incorporated, 655 Lone Oak Drive, Eagan MN 55121

197. Guy Robinson, European Centre for Medium Range Weather Forecasting, Reading RG3 9AX, Berkshire, United Kingdom

198. Matt Rosing, ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23665-5225

199. Joel Saltz, ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23665-5225

200. Ahmed H. Sameh, CSRD, University of Illinois 1308 West Main Street Urbana, IL 61801-2307

301. Erich Schikuta, CITI/CRPC Rice University, 6100 South Main, Houston, TX 77005

202. Rob Schreiber, R.IACS, Mail Stop T045-1, NASA Ames Research Center, Moffett Field, CA 94022

203. David S. Scott, Intel Scientific Computers, 15201 N . W. Greenbrier Parkway, Beaverton, OR 97006

204. Eugen Schenfeld, NEC Research Institute, 4 Independence Way, Princeton, NJ 08540

205. Ricardo A. Schmutzer, Pontificia Universidad Catolica de Chile, Department of Computer Science, Las Torcazas 212, Las Condes - Santiago, Chile

206. Mark Sears, Division 1424, Sandia National Laboratories, P 0 Box 5800, Albu- querque, NM 87185-5800

207. Ambuj Singh, UC Santa Barbara, Department of Computer Science, Santa Bar- bara, CA 93106

208. Chuck Simmons, 500 Oracle Parkway, Box 659414, Redwood Shores, CA 94065

209. Anthony Skjellum, Mississippi State University, Department of Computer Science, Drawer CS, Mississippi State, MS 39762-5623

210. Steven G . Smith, Lawrence Livermore National Lab, L-419, P. 0. Box 808, Liver- more, CA 94550

211. Marc Snir, IBM T. J . Watson Research Center, PO Box 218, Room 28-226, York- town Heights, NY 10598

212. Karl Solchenbach, PALLAS GmbH, Hermuelheimer Strasse 10 D-5040 Bruehl Ger- many

213. Charles H. Still, Lawrence Livermore National Lab, L-416, P. 0. Box 808, Liver- more, CA 94550

214. Alain Stroessel, Institut Francais du Petrole. Parallel Processing Group, BP 311 - 92506 Rueil Malmaison, France

215. Vaidy Sunderam, Emory University, Dept. of Math and Computer Science, At- lanta, GA 30322

- 28 -

216. Mike Surridge, Univ. of Southampton Parallel Applications Centre, 2 Venture Road, Chilworth Research Centre, Southampton SO1 7NP, United Kingdom

217. Alan Sussman, University of Maryland, Computer Science Department, A. V. Williams Building, College Park, MD 20742

218. Paul N. Swartztrauber, National Center for Atmospheric Research, P. 0. Box 3000, Boulder, C 0 80307

219. Clemens-August Thole, GMD-I1 .T, Schloss Birlinghoven, D-5205 Sankt Augustin 1, Germany

220. Bob Todinson, Los Alamos National Laboratory, Group C-8, MS B-272, Los Alamos, NM 87545

221. Anne Trefethen, Engineering and Theory Center, Cornell University, Ithaca, NY 14853

222. Christian Tricot, ARCHIPEL SA. , PAE des Glaisins, 1 rue du Bulloz, F-74940 Annecy-le-Vieux, France

223. Anna Tsao, Supercomputing Research Center, 17100 Science Drive, Bowie, MD 20715-4300

224. Lew Tucker, Thinking Machines Corporation, 245 First Street, Cambridge, MA 02142-1214

225. Robert van de Geijn, University of Texas, Department of Computer Sciences , TAI 2.124, Austin, TX 78712

226. Robert G. Voigt, National Science Foundation, Fbom 417, 1800 G Street, N.W., Washington, DC 20550

227. Linton Ward, 11400 Burnet Rd, Austin, TX 78758

228. Dick Weaver, IBM M77/E365, 555 Bailey Ave, P. 0. Box 49023, San Jose, CA 95161-9023

229. Tammy Welcome, Lawrence Livermore National Lab, Massively Parallel Comput- ing Initiative, L-416, P. 0. Box 808, Livermore, CA 94550

230. Jim West, IBM Corporation, MC 5600,3700 Bay Area Blvd., Houston, T X 77058

231. Stephen R. Wheat, Dept. 1424, Sandia National Labs, Albuquerque, NM 87185- 5800

232. Mary F. Wheeler, Rice University, Department of Mathematical Sciences, P. 0. Box 1892, Houston, TX 77251

233. Andrew B. White, Computing Division, Los Alamos National Laboratory, P. 0. Box 1663, MS-265, Los Alamos, NM 87545

234. Joel Williamson, Convex Computer Corporation, 3000 Waterview Parkway, Richard- son, TX 75083-3851

235. Steve Zenith, Kuck and Associates, Inc., 1906 Fox Drive, Champaign, IL 61820- 7334

- 29 -

236. Mohammad Zubair, NASA Langley Research Center, Mail Stop 132C, Hampton, VA 23665

237. Office of Assistant Manager for Energy Research and Development, U.S. Depart- ment of Energy, Oak Ridge Operations Office, P. 0. Box 2001 Oak Ridge, T N 37831-8600

238-239. Office of Scientific & Technical Information, P. 0. Box 62, Oak hdge , TN 37831

Documents

The design of a standard message passing interface for