Upload
barney
View
32
Download
0
Embed Size (px)
DESCRIPTION
Process Manager Update – May 6. The Process Manager component (PM) The Process Manager implementation (MPD2) Issues generated for other components by process management. The Process Manager Component. Added limits to interface definition Example on next page - PowerPoint PPT Presentation
Citation preview
Argonne National Laboratory + University of Chicago1
Process Manager Update – May 6
• The Process Manager component (PM)
• The Process Manager implementation (MPD2)
• Issues generated for other components by process
management
Argonne National Laboratory + University of Chicago2
The Process Manager Component
• Added limits to interface definition• Example on next page• Not implemented yet in terms of parsing and passing on to MPD
• Dynamic jobs (MPI_Comm_spawn)• Current interface allows the process manager to be given a list
of nodes and a number of processes to start, independently.• Process manager implementation can then use unused nodes
to start spawned processs (or not)• MPI_UNIVERSE_SIZE allows MPI job to get hint about how
many processes can usefully be spawned
Argonne National Laboratory + University of Chicago3
Limits Specification Example<create-process-group totalprocs='2'> <process-spec range='0' cwd='/home/rbutler/mpd2' exec='infloop'> <arg idx='1' value="hello"> </arg> <arg idx='2' value="from 0"> </arg> <limit type='cpu' value="2"/> </process-spec> <process-spec range='1' cwd='/home/rbutler/mpd2' exec='infloop'> <arg idx='1' value="hello"> </arg> <arg idx='2' value="from 1"> </arg> <env name='foo' value="bar"> </env> <limit type='cpu' value="3"/> </process-spec> <host-spec> magpie </host-spec></create-process-group>
Argonne National Laboratory + University of Chicago4
The Process Manager Implementation
• Improvements to MPD resulting from production use on Chiba• Mostly in recovering from errors and crashes by applications
• Support for limits (those supported in setrlimit)
• Improvements in configuring and building along with MPICH2
• Support for MPI_Comm_spawn through PMI interface to MPICH2 application
• Interactive debugging via mpigdb
Argonne National Laboratory + University of Chicago5
Coercing gdb Into Functioning as a Primitive Parallel Debugger• Key is control of stdin, stdout, stderr by MPD, through mpigdb
• Replaces mpiexec or mpirun on interactive command line• Usable through SSS process manager component
• Stdout, stderr collected in tree, labeled by rank, and merged for scalability (0-9) (gdb) p x
(0-2): $1 = 3.4
(3): $1 = 3.8
(4-9): $1 = 4.1
• Stdin can be broadcast to all or to a subset of processes• z 3 (to send input to process 3 only)• Same for interrupts
• Can run under debugger control, interrupt and query hung processes, parallel attach to running parallel job
Argonne National Laboratory + University of Chicago8
Issues Generated For Other Components
• Job steps• Option 1: QM handles (preferred)
• Process manager starts process groups directly• Need public definition of user interface to QM
• Option 2: PM implementation handles PBS-like scripts from QM• A bit weird: mpirun in a PBS script is trapped by extra layer
(MPISH) because the “real” mpirun is a call to MPD itself• In use on Chiba
• QM interface for requesting allocation of some number of nodes but starting up on different number of nodes, particularly for option 1.
• QM interface for requesting dynamic rebuilds
• Limits in QM interface?
Argonne National Laboratory + University of Chicago9
Tale of Two Queue Manager Implementations
PM
QM2
QM1qsub1
qsub2
Same XML syntax;different content
Different XML syntax
totalprocs=1, exec=myscript
totalprocs=64, exec=cpi
mycript contains: mpirun –np 64 cpi
Underlying process manager(MPD)
QM Interface (XML)
QM Interface (XML)
(mpirun intercepted)
(mpirun is interactive interfaceto underlying process manager)