Argonne National Laboratory + University of Chicago1
Process Manager Update – May 6
• The Process Manager component (PM)
• The Process Manager implementation (MPD2)
• Issues generated for other components by process
management
Argonne National Laboratory + University of Chicago2
The Process Manager Component
• Added limits to interface definition• Example on next page• Not implemented yet in terms of parsing and passing on to MPD
• Dynamic jobs (MPI_Comm_spawn)• Current interface allows the process manager to be given a list
of nodes and a number of processes to start, independently.• Process manager implementation can then use unused nodes
to start spawned processs (or not)• MPI_UNIVERSE_SIZE allows MPI job to get hint about how
many processes can usefully be spawned
Argonne National Laboratory + University of Chicago3
Limits Specification Example<create-process-group totalprocs='2'> <process-spec range='0' cwd='/home/rbutler/mpd2' exec='infloop'> <arg idx='1' value="hello"> </arg> <arg idx='2' value="from 0"> </arg> <limit type='cpu' value="2"/> </process-spec> <process-spec range='1' cwd='/home/rbutler/mpd2' exec='infloop'> <arg idx='1' value="hello"> </arg> <arg idx='2' value="from 1"> </arg> <env name='foo' value="bar"> </env> <limit type='cpu' value="3"/> </process-spec> <host-spec> magpie </host-spec></create-process-group>
Argonne National Laboratory + University of Chicago4
The Process Manager Implementation
• Improvements to MPD resulting from production use on Chiba• Mostly in recovering from errors and crashes by applications
• Support for limits (those supported in setrlimit)
• Improvements in configuring and building along with MPICH2
• Support for MPI_Comm_spawn through PMI interface to MPICH2 application
• Interactive debugging via mpigdb
Argonne National Laboratory + University of Chicago5
Coercing gdb Into Functioning as a Primitive Parallel Debugger• Key is control of stdin, stdout, stderr by MPD, through mpigdb
• Replaces mpiexec or mpirun on interactive command line• Usable through SSS process manager component
• Stdout, stderr collected in tree, labeled by rank, and merged for scalability (0-9) (gdb) p x
(0-2): $1 = 3.4
(3): $1 = 3.8
(4-9): $1 = 4.1
• Stdin can be broadcast to all or to a subset of processes• z 3 (to send input to process 3 only)• Same for interrupts
• Can run under debugger control, interrupt and query hung processes, parallel attach to running parallel job
Argonne National Laboratory + University of Chicago8
Issues Generated For Other Components
• Job steps• Option 1: QM handles (preferred)
• Process manager starts process groups directly• Need public definition of user interface to QM
• Option 2: PM implementation handles PBS-like scripts from QM• A bit weird: mpirun in a PBS script is trapped by extra layer
(MPISH) because the “real” mpirun is a call to MPD itself• In use on Chiba
• QM interface for requesting allocation of some number of nodes but starting up on different number of nodes, particularly for option 1.
• QM interface for requesting dynamic rebuilds
• Limits in QM interface?
Argonne National Laboratory + University of Chicago9
Tale of Two Queue Manager Implementations
PM
QM2
QM1qsub1
qsub2
Same XML syntax;different content
Different XML syntax
totalprocs=1, exec=myscript
totalprocs=64, exec=cpi
mycript contains: mpirun –np 64 cpi
Underlying process manager(MPD)
QM Interface (XML)
QM Interface (XML)
(mpirun intercepted)
(mpirun is interactive interfaceto underlying process manager)