16
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book- keeping Claudio Grandi (INFN Bologna)

Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

Embed Size (px)

Citation preview

Page 1: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003

BOSS:a tool for batch job

monitoring and book-keeping

Claudio Grandi(INFN Bologna)

Page 2: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 2Claudio Grandi INFN Bologna

BOSS“Batch Object Submission System”

Is a tool for job monitoring and book-keeping

Allows to deal with job-specific information

Is not a job scheduler, but can be interfaced with most schedulers: LSF (CERN, INFN) PBS (Bristol, Caltech, UFL, Imperial College, INFN) FBSNG (Fermilab) Condor (INFN, U.Wisconsin)

Has been designed to work on computing farms

Is compatible with use on a WAN, but is not robust against network failures (yet)

Page 3: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 3Claudio Grandi INFN Bologna

Basic BOSS components boss executable:

the BOSS interface to the user

MySQL database: where BOSS stores job information

jobExecutor executable: the BOSS wrapper around the user job

dbUpdator executable: the process that writes to the database while the job is

running

Local scheduler may be a “Grid” scheduler

Page 4: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 4Claudio Grandi INFN Bologna

Basic flow

Accepts job submission from users Stores info about job in a DB Builds a wrapper around the job (jobExecutor) Sends the wrapper to the local scheduler The wrapper sends to the DB info about the job

boss submitboss queryboss kill BOSS

DB

BOSS Local

Schedulerfarm node

farm node

Wrapper

Page 5: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 5Claudio Grandi INFN Bologna

User defined information User registers a job type:

Schema for the information to be monitored A new table is created in the BOSS database with a

defined structure

Algorithms to retrieve the information from the job The user programs (filters) are stored in the database as

blobs

User submits jobs: One or more job types can be specified for the job

A new entry is created for the job in the database tables

The filters are extracted from the database and made available to the running job

Page 6: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 6Claudio Grandi INFN Bologna

testJOBID COUNTER 12345 0

BO

SS

DB

std

ou

t

The job interface to BOSS

#!/usr/bin/perlwhile(<STDIN>){ if($_=~/.*counter\s+(\d+).*/){ print “COUNTER=$1\n"; }}

BOSSjobExecutor

counter 1

counter 2

counter 3 COUNTER=1COUNTER=2COUNTER=3

123

#!/usr/bin/perl$i = 0;while($i<3){ sleep(1); $i++; print "counter $i\n";}

Use

r jo

b

Fil

ter

jou

rnal 1234 test counter 1

1234 JOB T_START xxx1234 JOB …… ……

1234 test counter 21234 test counter 31234 JOB …… ……1234 JOB T_STOP yyy

BOSSdbUpdator

The job interfaces to BOSS are its standard input, output and error streams The user defined algorithms are filters that read

stdin/out/err and write key=value pairs

The keys are the user-defined schema variables

Page 7: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 7Claudio Grandi INFN Bologna

Runtime data flow

STDIN

STDOUT

STDERR

LOGUSER

OUT pipe

ERR pipe

tee

tee

Standard input or output Standard error Other I/O streams

User supplied or returned to the userTemporary processes and filesBOSS Processes and files

RunTime Filter pipe

jobExecutor

RunTime Filter pipeRunTime Filter pipe

Journal tee pipe

tee pipe

tee pipe

BOSSDB

dbUpdator

Page 8: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 8Claudio Grandi INFN Bologna

Queries Standard queries:

Get job status and user defined quantities% boss q -all -specific -type test

ID S_USR EXECUTABLE ST EXE_HOST START TIME STOP TIME comment counter

1 grandi test.pl 15 E pccms10.bo 14:30:00 06/06 14:30:16 06/06 ...STOP 15

2 grandi test.pl 15 R pccms10.bo 14:30:02 06/06 -------------- START... 13

Advanced queries: Use SQL to query job info (standard + user defined)

Output suitable for parsing by a script:% boss SQL -query "select JOB.ID,EXEC,counter from JOB,test WHERE JOB.ID=test.JOBID"

3,4,23,9

ID EXEC counter

1 test.pl 15

2 test.pl 13 number

of fields

Width of 1st field

…Width of nth field

Information line

Header line

Page 9: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 9Claudio Grandi INFN Bologna

Interface to the scheduler User registers a scheduler:

Scripts for job submission, deletion and query The scripts are stored in the database as blobs

The fork scheduler is already registered

User submits/deletes/queries jobs: The scheduler can be specified for the submission

The boss executable fetches the scripts from the database and uses them as interface to the scheduler

Job submission via ClassAd file is supported BOSS manages the keys it understands and passes the

others to the submission script

User-defined keys are possible!

Page 10: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 10Claudio Grandi INFN Bologna

BOSS as a grid-tool

boss submitboss queryboss kill

BOSSDB

Local BOSS gateway

GRIDScheduler

boss registerScheduler

gatekeeper

gatekeeper

farm node

farm node

Tested on the European DataGrid testbed Interface scripts incluided in BOSS distribution

See talk by P.Capiluppi

dbUpdator uses native MySQL calls

Proof of concept using R-GMA (from EDG-WP3) as BOSS transport layer (H.Nebrensky, Brunel Univ.)

Page 11: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 11Claudio Grandi INFN Bologna

Input/OutputSandbox

BOSS and R-GMA

BOSSDB

R-GMA Receiverservlets

R-GMARegistry

boss executable

Use

r In

terf

ace

Com

putin

g E

lem

ent

Wor

ker

Nod

e

R-GMAenabled

dbUpdator

jobExecutor starts user job

BOSSjournal

Useroutput

R-GMAProducerservlets

EDG WP1 + GRAM

Fire

wal

l

subscribelookup

Page 12: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 12Claudio Grandi INFN Bologna

Page 13: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 13Claudio Grandi INFN Bologna

Current use of BOSS CMS 2002 productions:

– about 500,000 jobs running in about 20 regional centers

– complete book-keepig of every single job

CMS/EDG stress test (Nov.-Dec. 2002):– about 10,000 jobs submitted by 4 user interfaces on

the European DataGrid testbed– allowed validation of jobs for which the output

sandbox was lost due to EDG internals

R-GMA demo at EDG review (Feb. 2002):– proof of concept

Page 14: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 14Claudio Grandi INFN Bologna

BOSS data analysis boss2root (by D.Bonacorsi)

– Produce root trees from BOSS MySQL tables– Used to analyze the data of the CMS/EDG stress test

- complete classification of problems

- graphical representation of results

Page 15: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 15Claudio Grandi INFN Bologna

Summary BOSS is a tool that allows real-time monitoring and

book-keeping of batch jobs User-defined information is archived for different job types

Has been used by CMS for 2002 official productions

Has been used during the CMS/EDG stress test in a grid environment

Is a general tool: nothing CMS or even HEP specific

Web site: http://www.bo.infn.it/cms/computing/BOSS/

Page 16: Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)

March 27th 2003CHEP'03 Conference, San Diego 16Claudio Grandi INFN Bologna