18
Job Life Cycle Management Libraries for CMS Workflow Management Projects Stuart Wakefield on behalf of CMS DMWM group Thanks to Frank van Lingen for the slides 1

Job Life Cycle Management Libraries for CMS Workflow Management Projects

  • Upload
    dulcea

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Job Life Cycle Management Libraries for CMS Workflow Management Projects. Stuart Wakefield on behalf of CMS DMWM group Thanks to Frank van Lingen for the slides. Motivation. Converge on cross project common components Uniform usage Lower maintenance - PowerPoint PPT Presentation

Citation preview

Page 1: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Job Life Cycle Management Libraries for CMS Workflow

Management Projects

Stuart Wakefield on behalf of CMS DMWM group

Thanks to Frank van Lingen for the slides

1

Page 2: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Motivation• Converge on cross project common components

– Uniform usage– Lower maintenance

• Prevent repetitive functionality implementation• Address performance bottlenecks (e.g. database

issues)• Provide developers with sufficient tools such that

they can focus on the (physics) domain specific part in their development

2

Page 3: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Architecture

3

• Common low level / API layer (WMCore)– Grid/Storage interaction – LCG, OSG, ARC etc.– CMS services – authentication, databases, site info…

• Event driven components (WMAgent)- Generic component harness- Common libraryof components

WMAgent T0

ProdAgent

CRAB

WMCore Commonlibraries

Specialised WMAgentimplementations

Page 4: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Structure of an Agent

4

Component specific

Page 5: Job Life Cycle Management Libraries for CMS Workflow Management Projects

CMS Workflows: 3* layers

5

*Tier0 does not have a request layer

Page 6: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Job Life Cycle Management• Different components based on WMCore handle

various states of a job– Create, submit, track, etc…– Components involved with a job depends on its state

• Possible that there are multiple type of jobs– Component need to differentiate between job types

• Components can interact with third party services– Site db, site submission, mass storage, etc..

• An application (e.g. CRAB, T0, Production) is a collection of components managing the life cycle– Not necessarily the same components 6

Page 7: Job Life Cycle Management Libraries for CMS Workflow Management Projects

7

Create

Submit

Track

Register DBS

Register Phedex

Cleanup

Job Type 1

Create

Submit

Track

Cleanup

Job Type n…………

…………

Synchronization between parallel

states

Job Creator

Job Submitter

Job Tracker

Job types and their states Components Representing

state (operations)

Cleanup

SubmitJob

CreateJob

JobSuccess

TrackJob

Sim

pli

fied

Exa

mp

le!!

Man

y m

ore

st

ates

(E

rro

r, Q

ueu

ed,

Ret

ry…

)Communication

through messages

Life cycles of job (types)

Page 8: Job Life Cycle Management Libraries for CMS Workflow Management Projects

8

Create Submit Track

MsgServiceTrigger

Database

WMBS

FwkJobReport

Harness

JobSpec

Site

JobSpec Job Report

WMCore provides common components without being context /project specific (e.g. CRAB, T0, Production)

Overview & Example

components

Error Handling

Register

Merge

sequential

Parallel

ThreadPool

Some components work in sequence on jobs, others in

parallel

Cleanup

Page 9: Job Life Cycle Management Libraries for CMS Workflow Management Projects

msg_queuebuffer_in buffer_out

Prevent single inserts and delete from large table. Buffer tables are purged/filled when a certain size is reached.

But: Still problem when one component is ‘dead’ or ‘stuck’ and others have messages going through buffer_in msg_queuebuffer_out. Messages dead component accumulate in msg_queue

Solution (or option): For each component have their own buffer_in, msg_queue, and buffer_out

Core msg metadata (e.g. subscriptions)+Msg Service

Delivery of asynchronous messages

9

Page 10: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Core msg metadata (e.g. subscriptions)

Msg_queue_component1

Msg_queu_component<x>

Messages distributed over more tables (prevent large tables)Soften impact of ‘dead’ componentUse table name pre/post fixing to prevent table name clashes. 10

Current transport implementation is based on inserting a message in a

database. This transport mechanism can be replaced, but we still can use

the rest of the persistent backend (~90%) including the buffering,

outlined here to store the messages and to ensure no messages are lost. An example of such a transport layer

is Twisted (http://twistedmatrix.com/trac/)

Page 11: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Other Core Services/Libraries

• (Persistent) Threadpool• Worker threads

– Long running threads within a component• Trigger

– Synchronization of components• Database connection management

– Through SQLAlchemy

11

Page 12: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Other Core Services/Libraries• Web development (HTTPFrontend)

– Facilitating development of web based components based on CherryPy

• WMBS Data model– Managing the relation between workflow, job

and data products

12

Provide developers with sufficient tools such that they can focus on the (physics) domain specific part in their development

Page 13: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Workflow Management Bookkeeping System (WMBS)

13

• Provide a generalized processing framework• Current system designed for production not processing• Subscription = workflow + fileset• Automate as much as possible

– Jobs created when new data in fileset available– Create subscriptions when new fileset produced, i.e. new

runs taken• Workflow defines how jobs created from data

File Set Workflow

Job

Output Files

File Details(input Files)

*

*

*

**

* subscriptions

Page 14: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Development

• Small team + tight schedule• Use “Sprints” to make rapid progress• Emphasize code style, quality, testing etc.• Periodically produce test reports

– Test on MySQL, SQLite and Oracle (not all developers have easy access to all architectures)

– Name and shame developers with failures– Determine author from CVS

14

Page 15: Job Life Cycle Management Libraries for CMS Workflow Management Projects

15

Run test_generate

Edit generated files (e.g. change output log files,

and mapping from developer to modules

Run test_codeRun test_style

•test_style •conf_test_mysql.py•conf_test_oracle.py•failures1.rep

•failures2_mysql.rep •failures2_oracle.rep •failures3_mysql.rep •failures3_oracle.rep

Cvs log file

Repeat (e.g. daily/weekly)

Periodically update the

test template files (e.g. once

per month)

Page 16: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Skeleton Code Generation• Existing components parsed to generate stubs for

new style components• Author’s then fill in the blanks (Handlers etc.), or• Rewrite as necessary• New (skeleton) components can be generated from

a simple specification• Heavy lifting taken care of - leaving the author to

concentrate on the task at hand

16

Page 17: Job Life Cycle Management Libraries for CMS Workflow Management Projects

(Workflow) Code Generation• Workflow can be visualized

– Components & messages

17

synchronizer = {'ID' : 'JobPostProcess',\ 'action' :

'PA.Core.Trigger.PrepareCleanup'}

 handler = {'messageIn' : 'SubmitJob',\ 'messageOut' : 'TrackJob|

JobSubmitFailed',\ 'component' : 'JobSubmitter',\ 'threading' : 'yes',\ 'createSynchronizer' : 'JobPostProcess’}

Defines a Trigger for component synchronization.

Defines a handler in a worklfow which acts on a messageIn messages and produces messageOut

messages. Threading means handling of messages is threaded

Page 18: Job Life Cycle Management Libraries for CMS Workflow Management Projects

Conclusion

• CMS distributed projects are moving to a common codebase.– Library functionality (grid interaction etc.).– Common component functionality.

• Taking the opportunity to refactor a lot of the existing code and improve testing etc.

• Provide common data processing functionality.• Aggressive schedule but aiming for reduced

maintenance cost for the future

18