31
Bookkeeping Tutorial

Bookkeeping Tutorial

  • Upload
    jude

  • View
    46

  • Download
    4

Embed Size (px)

DESCRIPTION

Bookkeeping Tutorial. Bookkeeping content. Contains records of all “jobs” and all “files” that are created by production jobs Job: In fact technically a “step” in a workflow E. g . “Gauss step”, “Brunel step”… For real RAW data: the “job” is in fact a DAQ run - PowerPoint PPT Presentation

Citation preview

Page 1: Bookkeeping Tutorial

Bookkeeping Tutorial

Page 2: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 2

Bookkeeping content

m Contains records of all “jobs” and all “files” that are created by production jobs

m Job:o In fact technically a “step” in a workflow

P E.g. “Gauss step”, “Brunel step”…o For real RAW data: the “job” is in fact a DAQ runo Has input files (except runs and Gauss)o Has output files

P Note that files may not be kept (i.e. have a replica)P All files are registered in order to keep the full history

o Has metadataP Location, production number, application, CPUTime, etc…

m Files:o Always defined as output of a “job”o Files are defined by an LFN (Logical File Name)o Contain metadata

P Number of events, size, event type, etc…

Page 3: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 3

Bookkeeping purpose

m Provenance databaseo Contains the full history of productions

P Traceability of datasetsm User dataset search

o Select a list of files from selection criteriaP Only files with a replica!P Generate Gaudi configuration file

o Give also access to the job/file treeP E.g. investigate history of a file

m Production datasets searcho Select the dataset to be processed by production jobs

P Ensures consistency of input files for a productiono Uses directly the BK API to get the list of files

Page 4: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 4

Bookkeeping partitioning

m Configuration Name / versiono Real data

P <DAQ partition> / <activity>o Simulated data

P “MC” / <activity>d <activity> : “DC06” / “MC09” …

m Conditionso Parameters of initial data

P All subsequent processed data inherit the “conditions”o Real data

P DAQ conditionsd Beam conditions, energy, magnetic field, detector conditions…

o Simulated dataP Simulation conditions

d Beam energy, magnetic field, luminosity, generator settings…

Page 5: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 5

Processing pass

m Associated to a level of processingo Within a given partition (config name / version + conditions)o Corresponds to the whole processing workflow

P Single workflow for a given processing passP Compatible versions of applications

o Specifies the processing pass of input data when applicableP Sequence of processing

o Re-processing creates branches

Gauss

SIM

Boole

DIGI

Brunel

DST

DaVinci

ETC

Brunel

DST

SimReco

Stripping

Page 6: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 6

Other query parameters

m Event typeo File propertyo Real data

P 90000000 : real data full streamP 90000001 : real data express streamP Types to be defined for stripping streams

o Simulated dataP LHCb convention for decay tree

m File typeo Data content / format

P Format not yet used

Page 7: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 7

Running the bookkeeping GUI

m Needs a valid Grid certificateo https://twiki.cern.ch/twiki/bin/view/LHCb/FAQ/Certificate

m Needs an X server

m On lxplus: lhcb-bkko SetupProject Dirac

P Sets up the environmento If needed: lhcb-proxy-init

P Creates a valid Grid proxyo dirac-bookkeeping-gui

m Individual commands can be issued from the prompt!

m You can also install Dirac locally on your Linux machine:o https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures - Installing_DIRAC_on_non_CERN_mac

Page 8: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 8

The query tree

Page 9: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 9

More info

m Right click ono Conditionso Processing pass

Page 10: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 10

Event type and file type

Page 11: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 11

Dataset selection

Logical File name

Page 12: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 12

Limit number of files per page

Page 13: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 13

Saving configuration (a.k.a. options) file

m Python configuration (default)o Still possible to create .opts (discouraged!)o .txt file for just a list of LFNs

m All files or selected files (if any)

Page 14: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 14

Advanced saving

m Select files for a site (for local usage, not Grid job)o LFN+XML catalog

P Next slideo PFN

Page 15: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 15

Advanced saving (LFN)

m LFNs + XML catalog

Page 16: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 16

Other queries

m Select another “tree”o Different order for the

query

m Production lookupo If you are interested in

a particular production number

m Run lookupo For real data (currently

FEST)

Page 17: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 17

Dealing with PFNs or XML catalogs

m Using ganga + DIRACo Bookkeeping integrated in ganga:

P dataset = browseBK()o LFN handling is then automatic…

m genXMLCatalogo Same functionality as “Advanced save” of GUIo Ensures files are available on the specified siteo Gets the PFN from the Storage Element

P Not constructed “by hand”

Page 18: Bookkeeping Tutorial

DIRAC Monitoringweb portal

Page 19: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 19

General information

m Entry point to the DIRAC web portalo http://dirac.cern.ch

m Web implementation of (almost) a full desktop applicationo Monitoring of productions / jobso Accounting (jobs, data management)o Allows to take actions on jobs

m Authentication / authorisation is mandatoryo Anonymous access gives minimal informationo Get a certificate and load it in our in your browser

https://twiki.cern.ch/twiki/bin/view/LHCb/FAQ/Certificateo DIRAC authorisation through “DIRAC groups”

P Default: lhcb_userP Other groups: lhcb_prod, dirac_admin…P Future: specific groups per physics groups, PPG (for production

authorisation)…P Capabilities depends on the group

Page 20: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 20

The DIRAC portal home page

IdentityDIRAC group

DIRAC instance

Menus

Page 21: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 21

Job Monitoring

Selection

Monitoring info Actions

Page 22: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 22

Job Monitoring (cont’d)

m Selectiono For group lhcb_user, only see your own jobso Can select with

P StatusP SiteP DateP …

m Columnso Can tailor the columns to be displayedo Clicking toggles the sorting in the column

m Rowso Jobs displayed in pages (default 25 rows, don’t exceed 100)o Can scroll pages

Page 23: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 23

Logging info

Page 24: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 24

Output peeking

Page 25: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 25

Attributes

Page 26: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 26

Parameters

Page 27: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 27

Job statistics

Page 28: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 28

Accounting

m Gives you access to your jobsm Select parameters:

o Ploto Time rangeo Item to plot against§ (site, status…)o Selection criteria

P SiteP (Final) Status

Page 29: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 29

Accounting screenshots

Page 30: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 30

Accounting (cont’d)

Page 31: Bookkeeping Tutorial

Bookkeeping & Monitoring Tutorial 31

Job CPU efficiency