13
REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

Embed Size (px)

Citation preview

Page 1: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

REIRecipe Execution Infrastructure

Page 2: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Purpose of REI

Main Objectives of REI- Provide the services of a parallel Batch Queue System.

- Make it easy to control and monitor complicated batches with job synchronization.

- Make it possible to distribute tasks (processing load) over a cluster of CPUs/nodes.

Not Provided in the Present Implementation- Services for distributing data within the cluster to the nodes doing the processing (data

sharing/distribution done via a common storage area/file server).

- Services provided for resource management and advertising.

- Services provided for explicit load balancing (optimized job distribution).

- Special features for GRID appliance provided.

Page 3: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Main Features

Main Features of REI- Implemented in C++ (in house implementation from scratch).

- Uses RDBMS for information sharing and task synchronization.

- Execution of shell commands or native execution of CPL Recipes (no generic interfacing to shared object files).

- Pworker task execution daemon provided – can take three roles:- Process Master Commands – Master Pworker.

- Process Standard Commands – Standard Pworker.

- Process Master and Standard Comands.

- Command line utilities provided to add/remove/monitor commands and to control Pworkers.

- API provided for implementing Master Command Libraries (also referred to as Recipe Planners) and Standard Command Libraries.

Page 4: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Command Line Interface

Interaction with REI- Command line interface provided:

- addcmd: Add a Master Command in the Master Command Queue (handles ABs and SOFs, which are not part of core of REI).

- cmdstat: Query the status of all commands or a specific command. ‘Tail’ feature provided.

- rmcmd: Remove information for one command or all commands from the Command Queues (clean up).

- pworker: The Pworker daemon.

- stopworker: Stop one specific Pworker or all Pworkers running.

- listworkers: List Pworkers running in the system.

- rmworker: Remove a Pworker (make it exit) or all Pworkers.

- The commands are not part of the core REI system, but should be seen as convenience features. They are based on the REI libraries.

- Can add commands in the DB directly via the REI libraries, i.e., can control and monitor the operation of REI programmatically.

Page 5: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Command Lifecycle

Command States- Each command submitted has 1 of 7 states indicating its current status:

Page 6: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Command Transitions

Page 7: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Interprocess Synchronization

Interprocess Synchronization/Information Sharing- Pworkers synchronize themselves via the DB.

- DB also used for exchanging information between processes in the system:

- Tables:

- pworker_registry: Information about Pworkers in the system (ID, node, Master and/or Standard Commands, …).

- pworker_master_command_queue: Contains information for the Master Commands waiting to be executed under execution and executed.

- pworker_master_sequencer: Contains information about Master Commands being BLOCKED.

- pworker_command_queue: Standard Commands waiting to be executed under execution and executed.

- pworker_command_sequencer: Used to sequence Standard Commands.

- pworker_log: Log messages from Pworker processes.

Page 8: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

OmegaCam Demo Science Reduction Cascade/1

OmegaCam Science Demo Cascade – Example- Used adapted WFI frames (8 extensions).

- Provided:- OCAM REI Recipe Planner Plug-In to schedule tasks for the recipes (general Recipe

Planner for all Recipes made).

- REI Standard Command Library Plug-Ins to do FITS file splitting and joining.

- Cascade Scheduler Script to submit Master Commands and to create SOF’s needed.

- 6 Recipes executed during the cascade (6 Master Commands issued to REI).

- Total number of commands scheduled within REI for the cascade: ~100.

- Total number of intermediate/temporary and final data products: ~200.

- Number of SOF’s involved: 10.

Page 9: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

OmegaCam Demo Science Reduction Cascade/2

Setting up Cascade – Example:

$ addcmd -name ocam_reduce_sci_W_2005-02-08T16:29:05 -bg -waitfor ocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_sci /data/ocam/sof/ocam_reduce_sci_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_sci_W_2005-02-08T16:29:05

$ addcmd -name ocam_reduce_std_W_2005-02-08T16:29:05 -bg -waitfor ocam_mflat_W_2005-02-08T16:29:05 -trigger ocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_std /raid/data/ocam/sof/ocam_reduce_std_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_std_W_2005-02-08T16:29:05

$ addcmd -name ocam_mflat_W_2005-02-08T16:29:05 -bg -waitfor ocam_mtwilight_W_2005-02-08T16:29:05 -trigger ocam_mflat_W_2005-02-08T16:29:05 -recipe ocam_mflat /raid/data/ocam/sof/ocam_mflat_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_mflat_W_2005-02-08T16:29:05

Page 10: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Task Synchronization

Master

Split

Split

Split

Split

BIAS

BIAS

BIAS

BIAS

BIAS

BIAS

BIAS

BIAS

Join Master

Split

Split

Split

Split

DOME

DOME

DOME

DOME

DOME

DOME

DOME

DOME

JoinCompl

Page 11: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Command Scheduling

Frame AFrame B

Split Split

Join Join

Recipe Recipe Recipe Recipe

Page 12: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

DFO Cascading

Controlling REI – DFO Environment- Already used in operation by DFO (since a while).- DFO uses REI to control scheduling of a UNIX shell script, which itself controls the

execution of the recipes (calling internally esorex).- DFO uses parallelism at frame level, no parallelism in connection with the processing of

each frame.- REI used as a queue system, jobs are submitted and the scheduling and execution of the

jobs carried out by REI. - Example addcmd in DFO environment:

$ addcmd -name SINFO.2004-08-21T20:25:28.895_tpl.ab -bg -trigger mflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T20:25:28.895_tpl.ab

$ addcmd -name SINFO.2004-08-21T19:55:07.961_tpl.ab -bg -trigger mwave_SINFO.2004-08-21T19:55:07.961_tpl.ab -waitfor mflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T19:55:07.961_tpl.ab

Page 13: REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure

REI – Recipe Execution Infrastructure

Jens Knudstrup/2005-02-08

Using REI

How to Integrate a Pipeline in REI (Simplified …)- Decide how to execute the recipes:

1. Native way in the form of CPL Recipes.2. Invoke the recipe library methods/functions from within Standard Commands.3. Execute via jacket scripts/applications encapsulating recipe.

- Define the necesary/desirable level of parallelism.- Define execution plans for the various cascades.- Implement Recipe Planner, if necessary, to do the internal coordination of the command

scheduling (+ producing data for the Standard Commands).- Implement Standard Command Library with special commands, which should execute

internally within the REI environment (if required).- Implement external control scripts to submit Master Commands, defining dependencies

and providing data for the command execution if necessary.- Decide architecture of processing cluster (number of Master Pworkers, Pworkers,

CPUs, nodes, amount of memory per CPU, …).- Start up Pworkers, defining their proper role + referring to the Command Plug-in

Libraries provided (if any) and/or possible CPL Recipe Plug-in Libraries.