7
Argonne National Laboratory + University of Chicago 1 Process Manager Update – May 6 The Process Manager component (PM) The Process Manager implementation (MPD2) Issues generated for other components by process management

Process Manager Update – May 6

  • Upload
    barney

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Process Manager Update – May 6. The Process Manager component (PM) The Process Manager implementation (MPD2) Issues generated for other components by process management. The Process Manager Component. Added limits to interface definition Example on next page - PowerPoint PPT Presentation

Citation preview

Page 1: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago1

Process Manager Update – May 6

• The Process Manager component (PM)

• The Process Manager implementation (MPD2)

• Issues generated for other components by process

management

Page 2: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago2

The Process Manager Component

• Added limits to interface definition• Example on next page• Not implemented yet in terms of parsing and passing on to MPD

• Dynamic jobs (MPI_Comm_spawn)• Current interface allows the process manager to be given a list

of nodes and a number of processes to start, independently.• Process manager implementation can then use unused nodes

to start spawned processs (or not)• MPI_UNIVERSE_SIZE allows MPI job to get hint about how

many processes can usefully be spawned

Page 3: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago3

Limits Specification Example<create-process-group           totalprocs='2'>      <process-spec    range='0'    cwd='/home/rbutler/mpd2'  exec='infloop'>        <arg idx='1' value="hello"> </arg>        <arg idx='2' value="from 0"> </arg>        <limit type='cpu' value="2"/>      </process-spec>      <process-spec  range='1'   cwd='/home/rbutler/mpd2'     exec='infloop'>        <arg idx='1' value="hello"> </arg>        <arg idx='2' value="from 1"> </arg>        <env name='foo' value="bar"> </env>        <limit type='cpu' value="3"/>      </process-spec>      <host-spec>           magpie      </host-spec></create-process-group>

Page 4: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago4

The Process Manager Implementation

• Improvements to MPD resulting from production use on Chiba• Mostly in recovering from errors and crashes by applications

• Support for limits (those supported in setrlimit)

• Improvements in configuring and building along with MPICH2

• Support for MPI_Comm_spawn through PMI interface to MPICH2 application

• Interactive debugging via mpigdb

Page 5: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago5

Coercing gdb Into Functioning as a Primitive Parallel Debugger• Key is control of stdin, stdout, stderr by MPD, through mpigdb

• Replaces mpiexec or mpirun on interactive command line• Usable through SSS process manager component

• Stdout, stderr collected in tree, labeled by rank, and merged for scalability (0-9) (gdb) p x

(0-2): $1 = 3.4

(3): $1 = 3.8

(4-9): $1 = 4.1

• Stdin can be broadcast to all or to a subset of processes• z 3 (to send input to process 3 only)• Same for interrupts

• Can run under debugger control, interrupt and query hung processes, parallel attach to running parallel job

Page 6: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago8

Issues Generated For Other Components

• Job steps• Option 1: QM handles (preferred)

• Process manager starts process groups directly• Need public definition of user interface to QM

• Option 2: PM implementation handles PBS-like scripts from QM• A bit weird: mpirun in a PBS script is trapped by extra layer

(MPISH) because the “real” mpirun is a call to MPD itself• In use on Chiba

• QM interface for requesting allocation of some number of nodes but starting up on different number of nodes, particularly for option 1.

• QM interface for requesting dynamic rebuilds

• Limits in QM interface?

Page 7: Process Manager Update – May 6

Argonne National Laboratory + University of Chicago9

Tale of Two Queue Manager Implementations

PM

QM2

QM1qsub1

qsub2

Same XML syntax;different content

Different XML syntax

totalprocs=1, exec=myscript

totalprocs=64, exec=cpi

mycript contains: mpirun –np 64 cpi

Underlying process manager(MPD)

QM Interface (XML)

QM Interface (XML)

(mpirun intercepted)

(mpirun is interactive interfaceto underlying process manager)