Upload
ailsa
View
55
Download
0
Embed Size (px)
DESCRIPTION
Revoke / Incarnation #s / Matching. Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke Basic problem: revoke is one-sided and can be called by multiple processes in the communicator - PowerPoint PPT Presentation
Citation preview
Revoke / Incarnation #s / Matching• Discussion around how to reclaim context IDs (resources that are a part of message matching)
after an MPI_Comm_revoke• Basic problem: revoke is one-sided and can be called by multiple processes in the communicator
– There is a race between calling revoke and when all correct processes update their local state to revoked
– Need to ensure that all processes have revoked the communicator before context ID can be reused• Scenario:
– Communicator with correct processes A, B, and C is revoked– A and B free revoked communicator and create a new communicator using the old context ID– C calls revoke on the old communicator -- what happens at A and B?– OR -- C sends a message to A/B who has posted an ANY_SOURCE receive -- does it match?
• Several solutions were discussed:1. Incarnation number -- An additional number on each context ID that becomes a part of the matching2. Group guards -- Check incoming messages to ensure that the sender is in the group of the
communicator3. Fault tolerant MPI_Comm_free/create -- Enhance create/free algorithms to quiesce context IDs before
they are used
RMA Semantics
• Pavan raised a concern about the definition of RMA window memory in the context of shared memory windows
• It may be impossible to guarantee that only locations updated in the window are invalid
• Suggested weakening the semantic to the entire window being undefined
• Requires further discussion
Shared Memory
• What happens if a process with shared memory goes down and another process has posted messages using its shared memory?– Yes this is an implementation issue, but is it
possible to do anything?