55
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Martin Schulz LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 http://www.mpi-forum.org/ LLNL-PRES-696804

LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

MartinSchulzLLNL/CASCChairoftheMPIForum

MPIForumBOF@ISC2016

http://www.mpi-forum.org/LLNL-PRES-696804

Page 2: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  MPI3.0ratifiedinSeptember2012•  Availableathttp://www.mpi-forum.org/•  SeveralmajoradditionscomparedtoMPI2.2

§  MPI3.1ratifiedinJune2015•  Inclusionforerrata(mainlyRMA,Fortran,MPI_T)•  Minorupdatesandadditions(addressarithmeticandnon-block.I/O)•  AdaptioninmostMPIsprogressingfast

AvailablethroughHLRS->MPIForumWebsite

Page 3: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  Non-blockingcollectives§  Neighborhoodcollectives

§  RMAenhancements

§  Sharedmemorysupport

§  MPIToolInformationInterface

§  Non-collectivecommunicatorcreation

§  Fortran2008Bindings

§  NewDatatypes

§  Largedatacounts§  Matchedprobe

§  NonblockingI/O

Page 4: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Status of MPI-3.1 Implementa5ons MPICH MVAPICH Open

MPICrayMPI

TianheMPI

IntelMPI

IBMBG/QMPI1

IBMPEMPICH2

IBMPla<orm

SGIMPI

FujitsuMPI

MSMPI MPC NEC

MPI

NBC ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ (*) ✔ ✔

NbrhoodcollecKves ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✘ ✔ ✔

RMA ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✘ Q2’17 ✔

Sharedmemory ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ * ✔

ToolsInterface ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ * Q4’16 ✔

Comm-creatgroup ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✘ * ✔

F08Bindings ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘ ✔ ✘ ✘ Q2’16 ✔

NewDatatypes ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ ✔ ✔

LargeCounts ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ Q2’16 ✔

MatchedProbe ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✔ Q2’16 ✔

NBCI/O ✔ Q3‘16 ✔ ✔ ✘ ✔ ✘ ✘ ✘ ✔ ✘ ✘ Q4’16 ✔

1OpenSourcebutunsupported 2NoMPI_Tvariablesexposed *Underdevelopment (*)Partlydone

ReleasedatesareesKmatesandaresubjecttochangeatanyKme.“✘”indicatesnopubliclyannouncedplantoimplement/supportthatfeature.

Pla<orm-specificrestricKonsmightapplytothesupportedfeatures

Page 5: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  SomeofthemajorinitiativesdiscussedintheMPIForum•  OneSidedCommunication(WilliamGropp)•  PointtoPointCommunication(DanielHolmes)•  MPISessions(DanielHolmes)•  HybridProgramming(PavanBalaji)•  LargeCounts(JeffHammond)

•  Shortupdatesonactivitiesontools,persistenceandfaulttolerance

§  HowtocontributetotheMPIForum?

Let’skeepthisinteractive–Pleasefeelfreetoaskquestions!

Page 6: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPIRMAUpdate

WilliamGroppwww.cs.illinois.edu/~wgropp

Page 7: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

BriefRecap:What’sNewinMPI-3RMA

•  SubstanJalextensionstotheMPI-2RMAinterface•  NewwindowcreaJonrouJnes:

–  MPI_Win_allocate:MPIallocatesthememoryassociatedwiththewindow(insteadoftheuserpassingallocatedmemory)

–  MPI_Win_create_dynamic:CreatesawindowwithoutmemoryaRached.UsercandynamicallyaRachanddetachmemoryto/fromthewindowbycallingMPI_Win_aRachandMPI_Win_detach

–  MPI_Win_allocate_shared:Createsawindowofsharedmemory(withinanode)thatcanbecanbeaccessedsimultaneouslybydirectload/storeaccessesaswellasRMAops

•  Newatomicread-modify-writeoperaJons–  MPI_Get_accumulate–  MPI_Fetch_and_op(simplifiedversionofGet_accumulate)–  MPI_Compare_and_swap

7

Page 8: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

What’snewinMPI-3RMAcontd.•  Anew“unifiedmemorymodel”inaddiJontotheexisJng

memorymodel,whichisnowcalled“separatememorymodel”

•  Theusercanquery(viaMPI_Win_get_aRr)whethertheimplementaJonsupportsaunifiedmemorymodel(e.g.,onacache-coherentsystem),andifso,thememoryconsistencysemanJcsthattheusermustfollowaregreatlysimplified.

•  Newversionsofput,get,andaccumulatethatreturnanMPI_Requestobject(MPI_Rput,MPI_Rget,…)

•  UsercanuseanyoftheMPI_Test/WaitfuncJonstocheckforlocalcompleJon,withouthavingtowaitunJlthenextRMAsynccall

8

Page 9: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPI-3RMAcanbeimplementedefficiently•  “EnablingHighly-ScalableRemoteMemoryAccessProgramming

withMPI-3OneSided”byRobertGerstenberger,MaciejBesta,TorstenHoefler(SC13BestPaperAward)

•  TheyimplementedcompleteMPI-3RMAforCrayGemini(XK5,XE6,XK7)andAries(XC30)systemsontopoflowest-levelCrayAPIs

•  AchievedbeRerlatency,bandwidth,messagerate,andapplicaJonperformancethanCray’sUPCandCray’sFortranCoarrays

9

LowerisbeR

er

HigherisbeR

er

Page 10: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPIRMAisCarefullyandPreciselySpecified

•  Toworkonbothcache-coherentandnon-cache-coherentsystems–  Eventhoughtherearen’tmanynon-cache-coherentsystems,itisdesignedwiththefutureinmind

•  ThereevenexistsaformalmodelforMPI-3RMAthatcanbeusedbytoolsandcompilersforopJmizaJon,verificaJon,etc.–  See“RemoteMemoryAccessProgramminginMPI-3”byHoefler,Dinan,Thakur,BarreR,Balaji,Gropp,Underwood.ACMTOPC,Volume2Issue2,July2015.

–  hRp://dl.acm.org/citaJon.cfm?doid=2798443.2780584

10

Page 11: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

SomeCurrentIssuesBeingConsidered

•  ClarificaJonstosharedmemorysemanJcs•  AddiJonalwaystodiscoversharedmemoryinexisJngwindows

•  NewasserJonsforpassivetargetepochs•  NonblockingRMAepochs

11

Page 12: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPI ASSERTIONS (PART OF THE POINT-TO-POINT WG) Dan Holmes

Page 13: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Assertions as communicator INFO keys •  Three separate issues

•  #52 – remove info key propagation for communicator duplication

•  #53 – add function MPI_Comm_idup_with_info

•  #11 – allow INFO keys to specify assertions not just than hints plus define 4 actual INFO key assertions

Page 14: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Remove propagation of INFO • Currently MPI_Comm_dup creates an exact copy of the

parent communicator including INFO keys and values •  The MPI Standard is not clear on which version of an

INFO key/value to propagate •  The one passed in by the user or the one used by the MPI library?

•  If INFO keys can specify assertions then propagating them is a bad idea •  Libraries are encouraged to duplicate their input communicator •  Libraries expect full functionality, i.e. no assertions •  Libraries won’t obey assertions they didn’t set and don’t understand

• Removal is backwards incompatible but •  Propagation was only introduced in MPI-3.0

Page 15: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Add MPI_Comm_idup_with_info • Non-blocking duplication of a communicator

•  Rather than blocking like MPI_Comm_idup

• Uses the INFO object supplied as an argument •  Rather than propagating the INFO from the parent communicator

like MPI_Comm_dup_with_info

• Needed for purely non-blocking codes, especially libraries

Page 16: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Allow assertions as INFO keys •  Language added to mandate that user must comply with

INFO keys that restrict user behaviour

• Allows MPI to rely on INFO keys and change its behaviour

• New optimisations are possible by limiting user demands

• MPI can remove support for unneeded features and thereby accelerate the functionality that is needed

Page 17: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

New assertions •  Four new assertions defined:

• mpi_assert_no_any_tag •  If set true, MPI_ANY_TAG will not be used

• mpi_assert_no_any_source •  If set true, MPI_ANY_SOURCE won’t be used

• mpi_assert_exact_length •  If set true, all sent messages will exactly fit their receive buffers

• mpi_assert_allow_overtaking •  If set true, messages can overtake even if not logically concurrent

Page 18: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Point-to-point WG •  Fortnightly meetings, Monday 11am Central US webex

•  All welcome!

•  Future business: •  Allocating-receive, freeing-send operations •  Further investigation of INFO key ambiguity

(Streams/channels has been moved to the Persistence WG)

Page 19: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPI SESSIONS Dan Holmes

Page 20: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

What are sessions? • A simple handle to the MPI library • An isolation mechanism for interactions with MPI • An extra layer of abstraction/indirection • A way for MPI/users to interact with underlying runtimes

•  Schedulers •  Resource managers

• An attempt to solve some threading problems in MPI •  Thread-safe initialisation by multiple entities (e.g. libraries) •  Re-initialisation after finalisation

• An attempt to solve some scalability headaches in MPI •  Implementing MPI_COMM_WORLD efficiently is hard

• An attempt to control the error behaviour of initialisation

Page 21: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

How can sessions be used? •  Initialise a session • Query available process “sets” • Obtain info about a “set” (optional) • Create an MPI_Group directly from a “set”

• Modify the MPI_Group (optional) • Create an MPI_Communicator directly from the MPI_Group (without a parent communicator) •  Any type, e.g. cartesian or dist_graph

Query runtime for set of processes

MPI_Group

MPI_Comm

MPI_Session

Page 22: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Why are sessions a good idea? • Any thread/library/entity can use MPI whenever it wants

• Error handling for sessions is defined and controllable

•  Initialisation and finalisation become implementation detail

• Scalability (inside MPI) should be easier to achieve

• Should complement & assist endpoints and fault tolerance

Page 23: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Who are sessions aimed at? • Everyone!

•  Library writers: no more reliance on main app for correct initialisation and provision of an input MPI_Communicator

• MPI developers: should be easier to implement scalability, resource management, (fault tolerance, endpoints, …)

• Application writers: MPI becomes ‘just like other libraries’

Page 24: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Sessions WG •  Fortnightly meetings, Monday 1pm Eastern US webex

•  All welcome!

•  https://github.com/mpiwg-sessions/sessions-issues/wiki

•  Future business: •  Dynamic “sets”? Shrink/grow – user-controlled/faults? •  Interaction with tools? Isolation causes issues? •  Different thread support levels on different sessions?

Page 25: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Towards Enhanced Support for Hybrid Programming (Hybrid WG)

PavanBalaji

HybridProgrammingWorkingGroupChair

[email protected]

Page 26: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

MPI Forum Hybrid WG Goals

§  EnsureinteroperabilityofMPIwithotherprogrammingmodels–  MPI+threads(pthreads,OpenMP,user-levelthreads)

–  MPI+CUDA,MPI+OpenCL

–  MPI+PGASmodels

§  AcJveProposals–  GeneralizingMPIprocesses

–  MPIEndpoints

ISCMPIBoF(06/21/2016)

Page 27: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

Generalizing MPI Processes

§  TheMPI-3.1standardis,insomeplaces,ambiguouswithrespecttowhatan“MPIProcess”is–  IsitanOSprocess(areglobalvariablessharedbetweenMPI

processes)?

–  Isitathread(areglobalvariablessharedbetweenasubsetofMPIprocesses)?

§  ProblemaJcforsomeMPIimplementaJonssuchasMPC(MPIprocessesareimplementedasthreads)

§  ProposalistoclarifythroughoutthestandardthattheMPIprocesscanbeimplementedasthreads–  SeparateproposaltoqueryforsuchinformaJon

ISCMPIBoF(06/21/2016)

Page 28: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

MPI Endpoints

§  ResourcesharingbetweenMPIprocesses–  Systemresourcesdonotscaleatthesamerateasprocessingcores

•  Memory,networkendpoints,TLBentries,…•  Sharingisnecessary

–  MPI+threadsgivesamethodforsuchsharingofresources

§  PerformanceConcerns–  MPI-3.1providesasingleviewoftheMPIstacktoallthreads

•  RequiresallMPIobjects(requests,communicators)tobesharedbetweenallthreads

•  Notscalabletolargenumberofthreads•  Inefficientwhensharingofobjectsisnotrequiredbytheuser

–  MPI-3.1doesnotallowahigh-levellanguagetointerchangeablyuseOSprocessesorthreads•  NonoJonofaddressingasingleoracollecJonofthreads•  Needstobeemulatedwithtagsorcommunicators

ISCMPIBoF(06/21/2016)

Page 29: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

Single view of MPI objects

§  MPI-3.1specificaJonrequirements–  ItisvalidinMPItohaveonethreadgeneratearequest(e.g.,throughMPI_IRECV)

andanotherthreadwait/testonit–  Onethreadmightneedtomakeprogressonanother’srequests–  Requiresallobjectstobemaintainedinasharedspace–  Whenathreadaccessesanobject,itneedstobeprotectedthroughlocks/atomics

•  CriJcalsecJonsbecomeexpensivewithhundredsofthreadsaccessingit

§  ApplicaJonbehavior–  Many(butnotall)applicaJonsdonotrequiresuchsharing–  AthreadthatgeneratesarequestisresponsibleforcompleJngit

•  MPIguaranteesaresafe,butunnecessaryforsuchapplicaJons

ISCMPIBoF(06/21/2016)

P0 (Thread 1) P0 (Thread 2) P1 MPI_Irecv(…, comm1, &req1); MPI_Irecv(…, comm2, &req2); MPI_Ssend(…, comm1);

pthread_barrier(); pthread_barrier(); MPI_Ssend(…, comm2);

MPI_Wait(&req2, …);

pthread_barrier(); pthread_barrier();

MPI_Wait(&req1, …);

Page 30: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

Interoperability with High-level Languages

§  InMPI-3.1,thereisnonoJonofsendingamessagetoathread–  CommunicaJoniswithMPIprocesses–threadsshareallresourcesin

theMPIprocess

–  Youcanemulatesuchmatchingwithtagsorcommunicators,butsomepieces(likecollecJves)becomeharderand/orinefficient

§  Somehigh-levellanguagesdonotexposewhethertheirprocessingenJJesareprocessesorthreads–  E.g.,PGASlanguages

§  WhentheselanguagesareimplementedontopofMPI,thelanguagerunJmemightnotbeabletouseMPIefficiently

ISCMPIBoF(06/21/2016)

Page 31: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

MPI Endpoints: Proposal for MPI-4

§  IdeaistohavemulJpleaddressablecommunicaJonenJJeswithinasingleprocess–  InstanJatedintheformofmulJpleranksperMPIprocess

§  Eachrankcanbeassociatedwithoneormorethreads

§  LessercontenJonforcommunicaJononeach“rank”

§  Intheextremecase,wecouldhaveonerankperthread(orsomeranksmightbeusedbyasinglethread)

ISCMPIBoF(06/21/2016)

Page 32: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

MPI Endpoints Semantics

§  CreatesnewMPIranksfromexisJngranksinparentcommunicator•  Eachprocessinparentcomm.requestsanumberofendpoints•  Arrayofoutputhandles,oneperlocalrank(i.e.endpoint)inendpointscommunicator•  EndpointshaveMPIprocesssemanJcs(e.g.progress,matching,collecJves,…)

§  ThreadsusingendpointsbehavelikeMPIprocesses•  Provideper-threadcommunicaJonstate/resources•  AllowsimplementaJontoprovideprocess-likeperformanceforthreads

ParentComm

Rank

M T T

ParentMPIProcess

RankRank Rank

M T T

ParentMPIProcess

Rank Rank

M T T

ParentMPIProcess

RankE.P.

Comm

MPI_Comm_create_endpoints(MPI_Commparent_comm,intmy_num_ep,MPI_Infoinfo,MPI_Commout_comm_handles[])

ISCMPIBoF(06/21/2016)

Page 33: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

Hyb

rid

MPI

+Ope

nMP

Exam

ple

Wit

h En

dpoi

nts

intmain(intargc,char**argv){intworld_rank,tl;intmax_threads=omp_get_max_threads();MPI_Commep_comm[max_threads];MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&tl);MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);#pragmaompparallel{intnt=omp_get_num_threads();inttn=omp_get_thread_num();intep_rank;#pragmaompmaster{MPI_Comm_create_endpoints(MPI_COMM_WORLD,nt,MPI_INFO_NULL,ep_comm);}#pragmaompbarrierMPI_Comm_rank(ep_comm[tn],&ep_rank);...//Doworkbasedon‘ep_rank’MPI_Allreduce(...,ep_comm[tn]);MPI_Comm_free(&ep_comm[tn]);}MPI_Finalize();}

ISCMPIBoF(06/21/2016)

Page 34: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

Additional Notes

§  Usefulformorethanjustavoidinglocks–  SemanJcsthatare“rank-specific”becomemoreflexible

•  E.g.,orderingforoperaJonsfromaprocess

•  OrderingconstraintsforMPIRMAaccumulateoperaJons

§  Supplementaryproposalonthread-safetyrequirementsforendpointcommunicators–  IseachrankonlyaccessedbyasinglethreadormulJplethreads?

–  Mightgetintegratedintothecoreproposal

§  ImplementaJonchallengesbeinglookedinto–  Simplyhavingendpointcommunicatorsmightnotbeuseful,iftheMPI

implementaJonhastomakeprogressonothercommunicatorstoo

ISCMPIBoF(06/21/2016)

Page 35: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PavanBalaji,ArgonneNaKonalLaboratory

More Info

§  Endpoints:•  hRps://svn.mpi-forum.org/trac/mpi-forum-web/Jcket/380

§  HybridWorkingGroup:•  hRps://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MPI3Hybrid

ISCMPIBoF(06/21/2016)

Page 36: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

LARGECOUNTWG

JeffHammond,Intel

Page 37: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

LargeCountWG

•  AddMPI_Foo_xforrelevantFoo?SeehRp://dx.doi.org/10.1109/ExaMPI.2014.5fordetailsanddiscussion.

•  UseFortraninterfaceforintandMPI_Count?•  UseC11_GenericforintandMPI_Count?•  C++providesequivalentofC11_Generic?

Page 38: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Challenges/PotenJal

•  ForummorerecepJvetolotsofMPI_Foo_xcomparedtoMPI3.0.– MPI_Alltoallv_xsolveslarge-dispissue.– NeededformostnonblockingcollecJves.

•  Fortraninterfaceviewedfavorably.•  C11_Genericisrathertrickyw.r.t.pointers.•  C++bindingsweredeleted.OddtobringbacknowjusttohaveC11.Non-standardwrapper?

Page 39: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

UPDATE ON TOOL INTERFACES IN MPI Slides by Kathryn Mohror, LLNL

Page 40: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

MPI_T interface is in good shape • Between MPI 3.0 and MPI 3.1

•  Lots of feedback from community •  Tools people and MPI implementors

•  Errata •  19 Errata to MPI 3.0 •  Good thing! People are using the interface!

•  Feature update in MPI 3.1 •  Handful of small changes

•  Quick look up of variables •  Add a couple new return codes •  Minor clarifications •  Specify that some function parameters are optional

Page 41: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Tools Interfaces Designs • New interface to replace PMPI

•  Known, longstanding problems with the current profiling interface •  One tool at a time can use it •  Forces tools to be monolithic (a single shared library) •  The interception model is OS dependent

•  New interface •  Callback design •  Multiple tools can potentially attach •  Maintain all old functionality

• New feature for event notification in MPI_T •  Keep good ideas from PERUSE without matching MPI_T approach •  Tool registers for interesting event and gets callback when it happens

Page 42: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

Debugging Interfaces Designs •  Fixing some bugs in the original “blessed” MPIR

document •  Missing line numbers!

• Working on design for new API-based support for debuggers – MPIR2 •  Support non-traditional MPI implementations

•  Ranks are implemented as threads •  Support for dynamic applications

•  Commercial applications/ Ensemble applications •  Fault tolerance

•  Handle Introspection Interface •  See inside MPI to get details about MPI Objects

•  Communicators, File Handles, etc.

Page 43: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

I have ideas. Can I join in the fun? • Yes! The MPI tools WG is open to everyone! •  Join the mailing list

•  http://lists.mpi-forum.org/ •  List name: mpiwg-tools

•  Join our meetings •  https://github.com/mpiwg-tools/tools-issues/wiki/Meetings

•  Look at the wiki for current topics •  https://github.com/mpiwg-tools/tools-issues/wiki

Page 44: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

CollecJvePersistentCommunicaJon

MPIForumBoFISC2016

WGlead:AnthonySkjellum,AuburnUniversity

Page 45: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

PersistentCommunicaJon•  ExistsforP2Pfromthebeginning

–  CreatepersistentrequestwithMPI_*_Init–  StartcommunicaJonwithMPI_Start–  CompleteasanyotherasynchronousoperaJon

•  Advantages–  OpJmizaJonpotenJalforrepeatedoperaJons

•  ProposaltoaddsameconceptforcollecJves–  Similarconceptbyadding“Init”rouJnes–  Closelyrelatedtonon-blockingcollecJves–  HighopJmizaJonpotenJal

Page 46: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

FaultToleranceandErrorHandlingMPIForumBoF

ISC2016

SlidesbyWesleyBland,Intel

Page 47: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

UserLevelFailureMiJgaJon(ULFM)

•  SupportprocesslevelfaulttoleranceacrossMPI

•  ProvidemechanismsfordetecJngandrecoveringfromfailures

•  SpecifyhowMPIshouldreacttofailures

•  SJllbeingdebatedbytheMPIForum– Hopetobringupforavotewithinafewmonths

Page 48: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

ErrorHandlingClarificaJons•  ErrorHandlers

–  NewerrorhandlerscouldprovidemoreinformaJontousers–  Givemoreflexibilitytohandlespecifictypesoferrorsindifferentways

–  CombinethedifferenttypesoferrorhandlersintoasinglefuncJon(formoregeneraluse)

•  ErrorReporJng–  AllowMPItodecideiferrorsarecatastrophicornot–  Non-catastrophicerrorswouldnotcauseMPItobecome“undefined”

•  OtherCleanup–  GenerallycleanupdefiniJonsarounderrorsthroughoutthestandard

Page 49: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

MartinSchulzLLNL/CASCChairoftheMPIForum

MPIForumBOF@ISC2016

http://www.mpi-forum.org/

Page 50: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  StandardizationbodyforMPI•  Discussesadditionsandnewdirections•  Overseesthecorrectnessandqualityofthestandard•  RepresentsMPItothecommunity

§  Organizationconsistsofchair,secretary,convener,steeringcommittee,andmemberorganizations

§  Openmembership•  Anyorganizationiswelcometoparticipate•  ConsistsofworkinggroupsandtheactualMPIforum•  Physicalmeetings4timeseachyear(3intheUS,onewithEuroMPI/Asia)—  Workinggroupsmeetbetweenforummeetings(viaphone)—  Plenary/fullforumworkisdonemostlyatthephysicalmeetings

•  Votingrightsdependonattendance—  Anorganizationhastobepresenttwooutofthelastthreemeetings

(incl.thecurrentone)tobeeligibletovote

Page 51: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

1.  Newitemsshouldbebroughttoamatchingworkinggroupfordiscussion

•  Creationofpreliminaryproposal•  Simple(grammar)changesarehandledbychaptercommittees

Page 52: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  Collectives&Topologies•  TorstenHoefler,ETH•  AndrewLumsdaine,Indiana

§  FaultTolerance•  WesleyBland,ANL•  AurelienBouteiller,UTK•  RichGraham,Mellanox

§  Fortran•  CraigRasmussen,U.ofOregon

§  GeneralizedRequests•  FabTillier,Microsoft

§  HybridModels•  PavanBalaji,ANL

§  I/O•  QuinceyKoziol,HDFGroup•  MohamadChaarawi,HDFGroup

§  Largecount•  JeffHammond,Intel

§  Persistence•  AnthonySkjellum,U.ofAlabama

§  PointtoPointComm.•  DanHolmes,EPCC•  RichGraham,Mellanox

§  RemoteMemoryAccess•  BillGropp,UIUC•  RajeevThakur,ANL

§  Tools•  KathrynMohror,LLNL•  Marc-AndreHermans,RWTHAachen

§  Newworkinggroups

•  Addedondemand•  Supportof4organizations

ataphysicalMPIforummeeting

Page 53: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

1.  Newitemsshouldbebroughttoamatchingworkinggroupfordiscussion

•  Creationofpreliminaryproposal•  Simple(grammar)changesarehandledbychaptercommittees

2.  SocializingofideadrivenbytheWG•  Couldincludeplenarypresentationtogatherfeedback—  Focusedonconceptsnotdetailslikenamesorformaltext

•  MakeproposaleasilyavailablethroughWGwiki•  Importanttokeepoverallstandardinmind

3.  Developmentoffullproposal•  Latexversionthatfitsintothestandard•  Creationoftickettotrackvoting

4.  MPIforumreading/votingprocess

Page 54: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  Quorum•  2/3ofeligibleorganizationshavetobepresent•  3/4ofpresentorganizationhavetovoteyes•  Goal:standardizeonlyifthereisconsensus

§  Steps1.  Reading:“Wordbyword”presentationtotheforum2.  Firstvote3.  Secondvote

§  Eachstephastobeataseparatephysicalmeeting•  Ensurepeoplehavetimetothinkaboutadditions•  Avoidhastymistakes,whicharehardtofix•  Prototypesareencouragedandhelpfultoconvincepeople

Page 55: LLNL / CASC Chair of the MPI Forum MPI Forum BOF @ ISC 2016 · The Message Passing Interface: On the Road to MPI 4.0 & Beyond Martin Schulz § Some of the major initiatives discussed

TheMessagePassingInterface:OntheRoadtoMPI4.0&BeyondMartinSchulz

§  MPIForumisanopenforum•  Everyone/everyorganizationcanjoin•  Votingrightsdependsonattendanceofphysicalmeetings

§  MajorinitiativestowardsMPI4.0•  ActivediscussionintherespectiveWGs•  Need/wantcommunityfeedback

§  Getinvolved•  Letusknowwhatyouoryourapplicationsneed—  [email protected]

•  ParticipateinWGs—  EmaillistandPhonemeetings—  EachWGhasitsownWiki

•  JoinusataMPIForumF2Fmeeting—  Nextmeetings:Edinburg,UK(Sep.),Dallas,TX(Dec.)

http://www.mpi-forum.org/