Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MPI-3 Datatypes, Topologies and Collectives
Matthias [email protected]
Zuse Institute Berlin
1 / 362017-10-20, HLRN Parallel Programming Workshop Fall 2017
The Message Passing Interface (MPI) Standard
MPI-3.0 (2012-09-21)• major update to the MPI standard
• extensions to collective operations (topologies, non-blocking)• extensions to one-sided operations• . . .
⇒ http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
MPI-3.1 (2015-06-04)• minor update to the MPI standard
• corrections/clarifications• portable MPI_AInt manipulation• nonblocking collective I/O• . . .
⇒ http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
2 / 36
The Message Passing Interface (MPI) StandardMPI-3.0 (2012-09-21)
• major update to the MPI standard• extensions to collective operations (topologies, non-blocking)• extensions to one-sided operations• . . .
⇒ http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
MPI-3.1 (2015-06-04)• minor update to the MPI standard
• corrections/clarifications• portable MPI_AInt manipulation• nonblocking collective I/O• . . .
⇒ http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
2 / 36
The Message Passing Interface (MPI) StandardMPI-3.0 (2012-09-21)
• major update to the MPI standard• extensions to collective operations (topologies, non-blocking)• extensions to one-sided operations• . . .
⇒ http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
MPI-3.1 (2015-06-04)• minor update to the MPI standard
• corrections/clarifications• portable MPI_AInt manipulation• nonblocking collective I/O• . . .
⇒ http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
2 / 36
Audience Survey
• Who uses MPI?
• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?
• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?
• . . . who plans to?• Who uses MPI only (i.e. one rank per core)?
• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?
• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?
• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?
• . . . which X?• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?
• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?
• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?
• Who uses topologies and neighborhood collectives?
3 / 36
Audience Survey
• Who uses MPI?• . . . who plans to?
• Who uses MPI only (i.e. one rank per core)?• . . .MPI+X?• . . . which X?
• Who uses Derived Data Types (DDTs)?• Who uses one-sided communication?• Who uses non-blocking collectives?• Who uses topologies and neighborhood collectives?
3 / 36
Writing Modern and Portable MPI• 6 MPI functions would be sufficient . . .
• MPI_Init(), MPI_Finalize(), MPI_Comm_rank(), MPI_Comm_size(),MPI_Send(), MPI_Recv()
• . . . there are roughly 250• provide abstraction, convenience, asynchronity . . .
⇒ room for general optimisation within MPI implementations
General Advise:• prefer declarative over imperative
• Specify what should be done . . .• . . . let MPI implementations figure out the how.
⇒ allows for portable performance
• chicken or egg problem with new features:• implementors wait for users to test and complain before optimising• users wait for fast implementations before adapting codes
4 / 36
Writing Modern and Portable MPI• 6 MPI functions would be sufficient . . .
• MPI_Init(), MPI_Finalize(), MPI_Comm_rank(), MPI_Comm_size(),MPI_Send(), MPI_Recv()
• . . . there are roughly 250• provide abstraction, convenience, asynchronity . . .
⇒ room for general optimisation within MPI implementations
General Advise:• prefer declarative over imperative
• Specify what should be done . . .• . . . let MPI implementations figure out the how.
⇒ allows for portable performance
• chicken or egg problem with new features:• implementors wait for users to test and complain before optimising• users wait for fast implementations before adapting codes
4 / 36
Writing Modern and Portable MPI• 6 MPI functions would be sufficient . . .
• MPI_Init(), MPI_Finalize(), MPI_Comm_rank(), MPI_Comm_size(),MPI_Send(), MPI_Recv()
• . . . there are roughly 250• provide abstraction, convenience, asynchronity . . .
⇒ room for general optimisation within MPI implementations
General Advise:• prefer declarative over imperative
• Specify what should be done . . .• . . . let MPI implementations figure out the how.
⇒ allows for portable performance
• chicken or egg problem with new features:• implementors wait for users to test and complain before optimising• users wait for fast implementations before adapting codes
4 / 36
Writing Modern and Portable MPI• 6 MPI functions would be sufficient . . .
• MPI_Init(), MPI_Finalize(), MPI_Comm_rank(), MPI_Comm_size(),MPI_Send(), MPI_Recv()
• . . . there are roughly 250• provide abstraction, convenience, asynchronity . . .
⇒ room for general optimisation within MPI implementations
General Advise:• prefer declarative over imperative
• Specify what should be done . . .• . . . let MPI implementations figure out the how.
⇒ allows for portable performance
• chicken or egg problem with new features:• implementors wait for users to test and complain before optimising• users wait for fast implementations before adapting codes
4 / 36
In this talk
• Derived Data Types (DDT)
• Nonblocking Collectives• Topologies and Neighborhood Collectives
5 / 36
In this talk
• Derived Data Types (DDT)• Nonblocking Collectives
• Topologies and Neighborhood Collectives
5 / 36
In this talk
• Derived Data Types (DDT)• Nonblocking Collectives• Topologies and Neighborhood Collectives
5 / 36
MPI Datatypes
• Basic and Derived Data Types (DDTs)• declare arbitrary data layouts using composable constructors• avoid using MPI_Byte and manual packing/unpacking
Definition: General Datatype• an opaque object that specifies two things:
• a sequence of basic datatypes• a sequence of integer (byte) displacements
• not required to be positive, distinct, or in order⇒ arbitrary layout
• can be used in all send and receive operations
6 / 36
MPI Basic Datatypes: Fortran
MPI datatype Fortran typeMPI_INTEGER INTEGER
MPI_REAL REAL
MPI_DOUBLE_PRECISION DOUBLE COMPLEX
MPI_COMPLEX COMPLEX
MPI_LOGICAL LOGICAL
MPI_CHARACTER CHARACTER(1)
MPI_BYTE
MPI_PACKED
7 / 36
MPI Basic Datatypes: C/C++
MPI datatype C/C++ typeMPI_CHARACTER char
MPI_INT signed int
MPI_UINT64_T, . . . uint64_t, . . .MPI_FLOAT float
MPI_DOUBLE double
MPI_C_BOOL _Bool
MPI_CXX_BOOL bool
MPI_FLOAT_COMPLEX float _Complex
MPI_DOUBLE_COMPLEX double _Complex
MPI_CXX_FLOAT_COMPLEX std::complex<float>
MPI_CXX_DOUBLE_COMPLEX std::complex<double>
8 / 36
MPI Special Datatypes
MPI datatype C type Fortran typeMPI_AINT MPI_Aint INTEGER (KIND=MPI_ADDRESS_KIND)
MPI_OFFSET MPI_Offset INTEGER (KIND=MPI_OFFSET_KIND)
MPI_COUNT MPI_Count INTEGER (KIND=MPI_COUNT_KIND)
MPI_BYTE
MPI_PACKED
MPI_DATATYPE_NULL
9 / 36
MPI Datatypes
• Basic and Derived Data Types (DDTs)• declare arbitrary data layouts using composable constructors• avoid using MPI_Byte and manual packing/unpacking
Definition: General Datatype• an opaque object that specifies two things:
• a sequence of basic datatypes• a sequence of integer (byte) displacements
• not required to be positive, distinct, or in order⇒ arbitrary layout
• can be used in all send and receive operations
10 / 36
MPI Datatypes: ConceptsType map
• Typemap = {(type0,disp0), . . . , (typen−1,dispn−1)}• e.g. {(int, 0)} for MPI_INT• no need to match between sender/receiver
Type signature• Typesig = {type0, . . . , typen−1}• e.g. {int} for MPI_INT• must match between sender/receiver
Communication Buffer• type map + base address (buffer argument)• i-th entry is at buffer + dispi and has type typei
Message• n values of the types from the signature
11 / 36
MPI Datatypes: ConceptsType map
• Typemap = {(type0,disp0), . . . , (typen−1,dispn−1)}• e.g. {(int, 0)} for MPI_INT• no need to match between sender/receiver
Type signature• Typesig = {type0, . . . , typen−1}• e.g. {int} for MPI_INT• must match between sender/receiver
Communication Buffer• type map + base address (buffer argument)• i-th entry is at buffer + dispi and has type typei
Message• n values of the types from the signature
11 / 36
MPI Datatypes: ConceptsType map
• Typemap = {(type0,disp0), . . . , (typen−1,dispn−1)}• e.g. {(int, 0)} for MPI_INT• no need to match between sender/receiver
Type signature• Typesig = {type0, . . . , typen−1}• e.g. {int} for MPI_INT• must match between sender/receiver
Communication Buffer• type map + base address (buffer argument)• i-th entry is at buffer + dispi and has type typei
Message• n values of the types from the signature
11 / 36
MPI Datatypes: ConceptsType map
• Typemap = {(type0,disp0), . . . , (typen−1,dispn−1)}• e.g. {(int, 0)} for MPI_INT• no need to match between sender/receiver
Type signature• Typesig = {type0, . . . , typen−1}• e.g. {int} for MPI_INT• must match between sender/receiver
Communication Buffer• type map + base address (buffer argument)• i-th entry is at buffer + dispi and has type typei
Message• n values of the types from the signature
11 / 36
MPI Derived Datatypes: Construction
Declare derived types via:• MPI_Type_contiguous()• MPI_Type_vector(), MPI_Type_create_hvector()• MPI_Type_indexed(), MPI_Type_create_hindexed()• MPI_Type_create_indexed_block(), MPI_Type_create_hindexed_block• MPI_Type_create_struct()• MPI_Type_create_subarray(), MPI_Type_create_darray()
// create typeint MPI_Type_commit ( MPI_Datatype * datatype )// query signature, i.e. message size in byteint MPI_Type_size ( MPI_Datatype datatype , int *size)// delete typeint MPI_Type_free ( MPI_Datatype * datatype )
12 / 36
MPI Derived Datatypes: Constructors// count times oldtypeint MPI_Type_contiguous (int count ,
MPI_Datatype oldtype ,MPI_Datatype * newtype )
// count blocklength-sized blocks of oldtype with gaps of size strideint MPI_Type_vector (int count ,
int blocklength ,int stride , // in oldtype elementsMPI_Datatype oldtype ,MPI_Datatype * newtype )
// ’h’ means heterogeneous, stride in byteint MPI_Type_create_hvector (int count ,
int blocklength ,MPI_Aint stride , // in byteMPI_Datatype oldtype ,MPI_Datatype * newtype )
13 / 36
MPI Derived Datatypes: Constructors// arbitrary memory layout of blocks of oldtypeint MPI_Type_indexed (
int count ,const int array_of_blocklengths [],const int array_of_displacements [],MPI_Datatype oldtype ,MPI_Datatype * newtype
)// like indexed, but with different typesint MPI_Type_create_struct (
int count ,const int array_of_blocklengths [],const MPI_Aint array_of_displacements [],const MPI_Datatype array_of_types [],MPI_Datatype * newtype
)14 / 36
MPI Derived Datatypes: Example/ / C structstruct elem { int a; float b; };elem data [4] = {{1, 1.0f}, {2, 2.0f}, {3, 3.0f}, {4, 4.0f}};
^^^^^^^^^ ^^^^^^^^^/ / corresponding MPI typeMPI_Datatype elem_type ;/ / 2 blocks, 1 element each, second element displaced by size of firstMPI_Type_create_struct (2, {1,1}, {0, MPI_Type_size ( MPI_INT )},
{MPI_INT , MPI_FLOAT }, & elem_type );MPI_Type_commit (& elem_type );
MPI_Datatype elem_vec_type ;/ / 2 times 1 elem with stride of 1MPI_Type_vector (2, 1, 1, elem_type , & elem_vec_type );MPI_Type_commit (& elem_vec_type );
/ / send 1 elem_vec to rank 1 with tag 0MPI_Send (data , 1, elem_vec_type , 1, 0, MPI_COMM_WORLD );
15 / 36
MPI Derived Datatypes: Example• avoid manual copying/rearranging/packing• do data transformations during transfer• let the MPI implementation perform things in an efficient way for the specific
target hardware⇒ convenience and portability
16 / 36
MPI Datatypes: Portable Address Handling
// use instead of address operator ’&’int MPI_Get_address (const void *location ,
MPI_Aint * address )// use for address arithmeticMPI_Aint MPI_Aint_add ( MPI_Aint base , MPI_Aint disp)MPI_Aint MPI_Aint_diff ( MPI_Aint addr1 , MPI_Aint addr2)
Displacement Arguments:• can be
• relative to an initial buffer argument• absolute, i.e. relative to MPI_BOTTOM
• use absolute displacements when a type is composed of multiple variables/arrays⇒ set buffer argument to MPI_BOTTOM
17 / 36
MPI Derived Datatypes Example: Absolute Displacementint MPI_Type_create_hindexed (
int count ,const int array_of_blocklengths [],const MPI_Aint array_of_displacements [],MPI_Datatype oldtype , MPI_Datatype * newtype );
/ / some non-contiguously allocated dataint array1 [10];/ * s o m e o t h e r a l l o c a t i o n s * /int array2 [10];
/ / use absolute displacementsMPI_Type_create_hindexed (2, {5,5}, / / two blocks of size 5
{ MPI_Get_address ( array1 ), / / first 5 elementsMPI_Aint_add ( MPI_Get_address ( array2 ),5 * MPI_Type_size ( MPI_INT ))}, / / last 5 elems (add size of 5 elems)
MPI_INT , & new_type ); / / elem type and new type/ / send as communication buffer relative to MPI_BOTTOMMPI_Send (MPI_BOTTOM , 1, new_type , ...);
18 / 36
Collective Communication in MPICollectives:
• MPI_Barrier(...)• MPI_Bcast(...)• MPI_Gather(...)• MPI_Scatter(...)• MPI_Allgather(...)• MPI_Alltoall(...)• . . .
New in MPI-3.x:• nonblocking collectives• topologies• neighborhood collectives
19 / 36http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf#143
Collective Communication in MPI: Barrier
int MPI_Barrier ( MPI_Comm comm);int MPI_Ibarrier ( MPI_Comm comm , MPI_Request * request );
// Example: nonblocking barrierMPI_Request req;MPI_Ibarrier ( MPI_COMM_WORLD , &req); // send notifications
// to all other processes
/* do s o m e t h i n g else */
MPI_Wait (&req , MPI_STATUS_IGNORE ); // blocking sync on// notifications from// other processes
20 / 36
Collective Communication in MPI: Broadcastint MPI_Bcast (void* buffer , int count ,
MPI_Datatype datatype , int root , MPI_Comm comm)
int MPI_Ibcast (void* buffer , int count ,MPI_Datatype datatype , int root , MPI_Comm comm ,MPI_Request * request )
21 / 36
Collective Communication in MPI: Scatter/Gather• MPI_Gather, MPI_Gatherv, MPI_Igather, MPI_Igatherv• MPI_Scatter, MPI_Scatterv, MPI_Iscatter, MPI_Iscatterv
int MPI_Igather (const void* sendbuf , int sendcount , MPI_Datatype sendtype ,void* recvbuf , int recvcount , MPI_Datatype recvtype ,int root , MPI_Comm comm , MPI_Request * request );
22 / 36
Collective Communication in MPI: All-gather• MPI_Allgather, MPI_Allgatherv, MPI_Iallgather, MPI_Iallgatherv
int MPI_Iallgather (const void* sendbuf , int sendcount , MPI_Datatype sendtype ,void* recvbuf , int recvcount , MPI_Datatype recvtype ,MPI_Comm comm , MPI_Request * request )
23 / 36
Collective Communication in MPI: All-to-All• MPI_Alltoall, MPI_Alltoallv, MPI_Alltoallw, MPI_Ialltoall,
MPI_Ialltoallv, MPI_Ialltoallw
int MPI_Ialltoall (const void* sendbuf , int sendcount , MPI_Datatype sendtype ,void* recvbuf , int recvcount , MPI_Datatype recvtype ,MPI_Comm comm , MPI_Request * request ;)
24 / 36
Non-blocking Collectives: Semantics
Execution• always returns• sync via MPI_Test() and MPI_Wait()• no MPI_Cancel()• out-of-order completion
Matching
• no tags• in-order matching (per communicator)• no matching between blocking and non-blocking collectives
25 / 36
MPI and Topologies
Physical Topology• the physical connection graph between the compute nodes of a system
Virtual Topology• the communication graph between the MPI ranks within a communicator• optional attribute to MPI (inter-)communicators• application specific information
⇒ user-provided via MPI calls• needed for neighborhood collectives• allows the MPI implementation to map processes onto hardware• could also be used for topology-aware job scheduling
⇒ facilitate optimal mapping of virtual to physical topology
26 / 36
MPI and TopologiesPhysical Topology
• the physical connection graph between the compute nodes of a system
Virtual Topology• the communication graph between the MPI ranks within a communicator• optional attribute to MPI (inter-)communicators• application specific information
⇒ user-provided via MPI calls• needed for neighborhood collectives• allows the MPI implementation to map processes onto hardware• could also be used for topology-aware job scheduling
⇒ facilitate optimal mapping of virtual to physical topology
26 / 36
MPI and TopologiesPhysical Topology
• the physical connection graph between the compute nodes of a system
Virtual Topology• the communication graph between the MPI ranks within a communicator• optional attribute to MPI (inter-)communicators• application specific information
⇒ user-provided via MPI calls• needed for neighborhood collectives• allows the MPI implementation to map processes onto hardware• could also be used for topology-aware job scheduling
⇒ facilitate optimal mapping of virtual to physical topology
26 / 36
MPI and TopologiesPhysical Topology
• the physical connection graph between the compute nodes of a system
Virtual Topology• the communication graph between the MPI ranks within a communicator• optional attribute to MPI (inter-)communicators• application specific information
⇒ user-provided via MPI calls• needed for neighborhood collectives• allows the MPI implementation to map processes onto hardware• could also be used for topology-aware job scheduling
⇒ facilitate optimal mapping of virtual to physical topology26 / 36
Topology Types and Constructors
MPI supports three topology types:
• Graph⇒ does not scale, don’t use
• Distributed Graph• arbitrary, directed graph• MPI_Dist_graph_create_adjacent(...)• MPI_Dist_graph_create(...)
• Cartesian (convenience)• n-dimensional cartesian grid• bidirectional edges between neighbors• MPI_Cart_create(...)
27 / 36
Topology Types and Constructors
MPI supports three topology types:• Graph
⇒ does not scale, don’t use
• Distributed Graph• arbitrary, directed graph• MPI_Dist_graph_create_adjacent(...)• MPI_Dist_graph_create(...)
• Cartesian (convenience)• n-dimensional cartesian grid• bidirectional edges between neighbors• MPI_Cart_create(...)
27 / 36
Topology Types and Constructors
MPI supports three topology types:• Graph
⇒ does not scale, don’t use• Distributed Graph
• arbitrary, directed graph• MPI_Dist_graph_create_adjacent(...)• MPI_Dist_graph_create(...)
• Cartesian (convenience)• n-dimensional cartesian grid• bidirectional edges between neighbors• MPI_Cart_create(...)
27 / 36
Topology Types and Constructors
MPI supports three topology types:• Graph
⇒ does not scale, don’t use• Distributed Graph
• arbitrary, directed graph• MPI_Dist_graph_create_adjacent(...)• MPI_Dist_graph_create(...)
• Cartesian (convenience)• n-dimensional cartesian grid• bidirectional edges between neighbors• MPI_Cart_create(...)
27 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE_ADJACENT• build an arbitrary, directed graph• each ranks specifies its adjacent nodes/eges
⇒ allows for every communication pattern
int MPI_Dist_graph_create_adjacent (MPI_Comm comm_old ,int indegree ,const int sources [],const int sourceweights [],int outdegree ,const int destinations [],const int destweights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
28 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE_ADJACENT• build an arbitrary, directed graph• each ranks specifies its adjacent nodes/eges
⇒ allows for every communication pattern
int MPI_Dist_graph_create_adjacent (MPI_Comm comm_old ,int indegree ,const int sources [],const int sourceweights [],int outdegree ,const int destinations [],const int destweights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
28 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE_ADJACENT• build an arbitrary, directed graph• each ranks specifies its adjacent nodes/eges
⇒ allows for every communication pattern
int MPI_Dist_graph_create_adjacent (MPI_Comm comm_old ,int indegree ,const int sources [],const int sourceweights [],int outdegree ,const int destinations [],const int destweights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
28 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE_ADJACENT• build an arbitrary, directed graph• each ranks specifies its adjacent nodes/eges
⇒ allows for every communication pattern
int MPI_Dist_graph_create_adjacent (MPI_Comm comm_old ,int indegree ,const int sources [],const int sourceweights [],int outdegree ,const int destinations [],const int destweights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
28 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE• same result as MPI_Dist_graph_create_adjacent()• every rank can specify arbitrary edges of the topology
int MPI_Dist_graph_create (MPI_Comm comm_old ,int n,const int sources [],const int degrees [],const int destinations [],const int weights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
29 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE• same result as MPI_Dist_graph_create_adjacent()• every rank can specify arbitrary edges of the topology
int MPI_Dist_graph_create (MPI_Comm comm_old ,int n,const int sources [],const int degrees [],const int destinations [],const int weights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
29 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE• same result as MPI_Dist_graph_create_adjacent()• every rank can specify arbitrary edges of the topology
int MPI_Dist_graph_create (MPI_Comm comm_old ,int n,const int sources [],const int degrees [],const int destinations [],const int weights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
29 / 36
Topology Constructors: MPI_DIST_GRAPH_CREATE• same result as MPI_Dist_graph_create_adjacent()• every rank can specify arbitrary edges of the topology
int MPI_Dist_graph_create (MPI_Comm comm_old ,int n,const int sources [],const int degrees [],const int destinations [],const int weights [],MPI_Info info ,int reorder , / / reorder ranksMPI_Comm * comm_dist_graph / / resulting communicator
)
29 / 36
Topology Constructors: MPI_CART_CREATE
• create n-dim. Cartesian grid topology
int MPI_Cart_create (MPI_Comm comm_old ,int ndims ,const int dims [],const int periods [], / / periodic boundariesint reorder , / / reorder ranksMPI_Comm * comm_cart / / resulting communicator
)
30 / 36
Topology Constructors: MPI_DIMS_CREATE• convenience function to shape n-dim. grid for a number of ranks
int MPI_Dims_create (int nnodes , / / number of ranksint ndims ,int dims [] / / in/out: non-zero values won’t be touched
)
31 / 36
Neighborhood Collectives
• sparse nearest-neighbor communication• based on communicator topology• same calling/matching rules as for other collectives
Calls:• MPI_Neighbour_allgather, MPI_Neighbour_allgatherv• MPI_Ineighbour_allgather, MPI_Ineighbour_allgatherv• MPI_Neighbour_alltoall, MPI_Neighbour_alltoallv,
MPI_Neighbour_alltoallw• MPI_Ineighbour_alltoall, MPI_Ineighbour_alltoallv,
MPI_Ineighbour_alltoallw
• MPI_Dist_graph_neighbors_count, MPI_Dist_graph_neighbors
32 / 36
Neighborhood Collectives
• sparse nearest-neighbor communication• based on communicator topology• same calling/matching rules as for other collectives
Calls:• MPI_Neighbour_allgather, MPI_Neighbour_allgatherv• MPI_Ineighbour_allgather, MPI_Ineighbour_allgatherv• MPI_Neighbour_alltoall, MPI_Neighbour_alltoallv,
MPI_Neighbour_alltoallw• MPI_Ineighbour_alltoall, MPI_Ineighbour_alltoallv,
MPI_Ineighbour_alltoallw
• MPI_Dist_graph_neighbors_count, MPI_Dist_graph_neighbors
32 / 36
Neighborhood Collectives
• sparse nearest-neighbor communication• based on communicator topology• same calling/matching rules as for other collectives
Calls:• MPI_Neighbour_allgather, MPI_Neighbour_allgatherv• MPI_Ineighbour_allgather, MPI_Ineighbour_allgatherv• MPI_Neighbour_alltoall, MPI_Neighbour_alltoallv,
MPI_Neighbour_alltoallw• MPI_Ineighbour_alltoall, MPI_Ineighbour_alltoallv,
MPI_Ineighbour_alltoallw
• MPI_Dist_graph_neighbors_count, MPI_Dist_graph_neighbors
32 / 36
Neighborhood Collectives: MPI_[I]NEIGHBOR_ALLGATHER[V|W]
• receive distinct data from every source neighbor• send same data to every destination neighbor
int MPI_Ineighbor_allgatherv (const void* sendbuf ,int sendcount ,MPI_Datatype sendtype ,void* recvbuf ,int recvcounts [],const int displs [],MPI_Datatype recvtype ,MPI_Comm comm ,MPI_Request * request
)
33 / 36
Neighborhood Collectives: MPI_[I]NEIGHBOR_ALLGATHER[V|W]
• receive distinct data from every source neighbor• send same data to every destination neighbor
int MPI_Ineighbor_allgatherv (const void* sendbuf ,int sendcount ,MPI_Datatype sendtype ,void* recvbuf ,int recvcounts [],const int displs [],MPI_Datatype recvtype ,MPI_Comm comm ,MPI_Request * request
)
33 / 36
Neighborhood Collectives: MPI_[I]NEIGHBOR_ALLTOALL[V|W]• receive distinct data from every source neighbor• send distinct data to every destination neighbor• every edge can have distinct send/recv count,
displacements and datatypes
int MPI_Ineighbor_alltoallw (const void* sendbuf ,const int sendcounts [],const MPI_Aint sdispls [],const MPI_Datatype sendtypes [],void* recvbuf ,const int recvcounts [],const MPI_Aint rdispls [],const MPI_Datatype recvtypes [],MPI_Comm comm ,MPI_Request * request )
34 / 36
Neighborhood Collectives: MPI_[I]NEIGHBOR_ALLTOALL[V|W]• receive distinct data from every source neighbor• send distinct data to every destination neighbor• every edge can have distinct send/recv count,
displacements and datatypes
int MPI_Ineighbor_alltoallw (const void* sendbuf ,const int sendcounts [],const MPI_Aint sdispls [],const MPI_Datatype sendtypes [],void* recvbuf ,const int recvcounts [],const MPI_Aint rdispls [],const MPI_Datatype recvtypes [],MPI_Comm comm ,MPI_Request * request )
34 / 36
Neighborhood Collectives: Example/ / create topology, 3 dims, 2x2x2 processses, all dims periodic, rank reorderingMPI_Cart_create (comm , 3, {2,2,2}, {1,1,1}, 1, & topo_comm );/ / get coords for own rank in topo_comm (reordering takes effect here!)int coords [3];MPI_Cart_coords (topo_comm , rank , 3, coords );/ * l o a d d a t a p a r t i t i o n c o r r e s p o n d i n g t o c o o r d s * /
while (! done) {/ / start neighbour communication, i.e. update halo-regionsMPI_Ineighbor_alltoall ( / * . . . * / , &topo_comm , &req);
/ * c o m p u t e i n n e r p a r t s * /
MPI_Wait (&req , MPI_STATUS_IGNORE ); / / finish communication
/ * c o m p u t e o u t e r p a r t s * /}
35 / 36https://htor.inf.ethz.ch/blog/index.php/2012/02/06/mpi-3-0-is-coming-an-overview-of-new-and-old-features/
EoP
36 / 36