Upload
brandy
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CMS Monte Carlo Production in LCG. J. Caballero, J.M . Hernández, P. García-Abia (CIEMAT) CMS Collaboration. Computing in High Energy and Nuclear Physics, T.I.F.R. Mumbai, India, 13-17 February 2006. Outline. Introduction Monte Carlo production framework: - PowerPoint PPT Presentation
Citation preview
CMS Monte Carlo Production in LCG
J. Caballero, J.M. Hernández, P. García-Abia (CIEMAT)
CMS Collaboration
Computing in High Energy and Nuclear Physics,
T.I.F.R. Mumbai, India, 13-17 February 2006
CHEP06, MumbaiCMS Monte Carlo Production in LCG 2P. Garcia-Abia / CIEMAT
Outline
Introduction
Monte Carlo production framework: Data tiers, metadata attachment, publication of data
Production workflow
First experiences
Improvements to production: Output ZIP archives, treatment of pile-up, local software installation
Production operations Efficiency, problems
Migration to LFC
The new MC production system
Conclusions
CHEP06, MumbaiCMS Monte Carlo Production in LCG 3P. Garcia-Abia / CIEMAT
Introduction
Monte Carlo (MC) production is crucial for detector studies and physics analysis
Event simulation and reconstruction typically done in computer farms of CMS institutions
Porting production to LCG allows using a large amount of computing, storage and network resources
MC simulation was previously run in a LCG0-LCG1 dedicated testbed: Low scale production
Low efficiency: RLS, site configuration
We had introduced novel concepts which have made running the full production chain possible on LCG, from the generation of events to the publication of data for analysis
We had coupled production and the CMS data transfer system (PhEDEx) and made tools more robust
Important implications for the design of the new production framework
CHEP06, MumbaiCMS Monte Carlo Production in LCG 4P. Garcia-Abia / CIEMAT
Introduction II
The CMS event data model (EDM) and the MC production framework are somewhat monolithic, not suitable for a Grid environment: Lack of modularity
Grid provides basic services: Reliability, stability and flexibility are important issues
We have identified the main weak points of LCG and made the production framework more robust: Efficient running of production in LCG is manpower intensive
Availability of the resources and responsiveness of the local administrators are crucial
Code development-and-testing and running of production in LCG done by ~1.5=FTE at CIEMAT
CHEP06, MumbaiCMS Monte Carlo Production in LCG 5P. Garcia-Abia / CIEMAT
Production framework
The basic unit in MC production is the dataset: a given physics process with a well defined set of parameters
The production chain is: generation of events, simulation (hits), digitization (digis) and
reconstruction (DST)
these are called data tiers or steps
owner: data tier with defined geometry, SW version and pile-up (PU) sample
Detector and physics groups request events of a specific dataset/owner pair
For practical reasons, requests are split in small assignments composed of a number of runs (~1000 events)
The data relevant to production (requests, owner/dataset, assignments, runs, data attributes) are kept in a global database (RefDB)
The MC production framework is McRunjob, a python application developed at FNAL, used for local farm production since long
CHEP06, MumbaiCMS Monte Carlo Production in LCG 6P. Garcia-Abia / CIEMAT
Data tiers
Generation: no input, small output (10 to 50 MB ntuples)
pure CPU: few minutes, up to few hours if hard filtering present
Simulation (hits): GEANT4 small input
CPU and memory intensive: 24 to 48 hours
large output: ~500 MB in three files (EVD files), the smallest is ~ 100 KB !
Digitization: lower CPU/memory requirements: 5 to 10 hours
I/O intensive: persistent reading of PU through LAN
large output: similar to simulation
Reconstruction: even less CPU: ~5 hours
smaller output: ~200 MB in two files
CHEP06, MumbaiCMS Monte Carlo Production in LCG 7P. Garcia-Abia / CIEMAT
Event metadata attachment
In order to run the digitization step, event metadata have to be generated for the whole collection of simulated events
When running reconstruction the metadata of both the simulated and the digitized events are required
The generation of metadata (metadata attachment) needs direct access to the event files, not suitable for distributed systems: output of the jobs potentially distributed among several Storage Elements
with no POSIX I/O-like access
Metadata attachment was the main show-stopper for porting the MC production system to LCG: lack of modularity (atomicity) in the old EDM
We introduced the concept of atomic attachment: Metadata attachment done on the Worker Node for the run to be processed
Negligible overhead: EVD files already in the working area
CHEP06, MumbaiCMS Monte Carlo Production in LCG 8P. Garcia-Abia / CIEMAT
Publication of data
We have coupled production in LCG and the CMS data transfer system: PhEDEx is used to collect event files in the T1s/T2s that host data for analysis
However, data handling for intermediate steps not done by PhEDEx...
(this is one of the main problems in production)
For each owner/dataset, a global metadata attachment is performed: metadata and local XML POOL file catalogs are produced and made public in
the data location system (global RefDB and local data location DB -PubDB-)
Analysis tools inspect RefDB/PubDB for data discovery: Analysis jobs are submitted to the appropriate T1/T2
CHEP06, MumbaiCMS Monte Carlo Production in LCG 9P. Garcia-Abia / CIEMAT
Production workflow
Job preparation: McRunjob downloads assignment information from RefDB:
• List of runs, job templates, application data-cards, input file specification, input/output virgin metadata
Jobs are created for each run using the templates:• Application scripts
• JDL file with grid requirements (CPU, memory, SW tags, site...)
• Wrapper script: specific stuff to let the job run in a LCG WN
Jobs are submitted to a LCG CE using the JDL
At runtime on the WN: Input files are downloaded from SE
Metadata are generated for the input event files
After the application runs the output EVDs are copied to the SE
The summary file is returned in the output sandbox and sent to RefDB from the UI for validation of the job
Originally, also the application output/error was returned
CHEP06, MumbaiCMS Monte Carlo Production in LCG 10
P. Garcia-Abia / CIEMAT
First experiences
The first experiences were disappointing: Extremely high submission time (very low rate)
Very low job efficiency
Job retrieval time too high: huge output
Failure causes: Local configuration problems: unavailability of CMS software (installation
problems), NFS
Instability of the RLS global catalog
Problems staging in/out files: weak staging procedure, copy from/to the SE unreliable
Poor error report from the application: hard to automatise job resubmission, typically done after visual inspection of the logs
Real time monitoring unavailable
CHEP06, MumbaiCMS Monte Carlo Production in LCG 11
P. Garcia-Abia / CIEMAT
Improvements to production
We introduced new ideas in the production system in order to make it more robust
Output/error files of the application removed from the output sandbox: Size largely reduced: significant improvement of the job retrieval rate
Virgin metadata and XML POOL catalog of the job removed from the input sandbox (size reduced to 10 KB): Stored in several SEs at job submission time (atomic operation) to improve
their availability
Significant improvement of the job submission rate
More robust stage in/out procedure: Failing input/output operations to/from the SE are retried several times (with a
delay) to avoid temporary access problems to the SE/RLS
The copy of the job output is tried on several SEs if one fails
CHEP06, MumbaiCMS Monte Carlo Production in LCG 12
P. Garcia-Abia / CIEMAT
Output ZIP archives At job completion time, the output EVD files are packed in a ZIP archive
(without compression) together with other important files: Checksum of the EVD files, XML POOL catalog fragment of the output files,
summary file, output and error files of the application
Just one big file is copied to the SE, instead of several EVD files One of the EVDs is only 100 KB in size (very bad for MSS performance)
CMS applications can read files inside uncompressed ZIP archives (without unpacking them)
Zipping had implications for the job preparation of subsequent steps: We instrumented the job wrapper to deal properly with ZIPs
... and in the publication of data: The publication tool, CMSGLIDE (M.A. Afaq, FNAL), was modified to be able
to create XML POOL catalogs and attached metadata for production ZIPs Global metadata attachment done using ZIPs (w/o unzipping)
Zipping of EVDs widely adopted in CMS: EVDs produced in local farmas merged into 2 GB files
Great benefits for PhEDEx and MSS: much less, much larger, files
CHEP06, MumbaiCMS Monte Carlo Production in LCG 13
P. Garcia-Abia / CIEMAT
Treatment of pile-up
Proper simulation of events requires the superposition of events from inelastic pp interactions on the events of simulated physics processes
Large pile-up (PU) samples prepared at CERN: about 100 GB
Local farm: PU installed locally and made accessible to the jobs at runtime via POSIX I/O-like (rfio, dcap)
LCG: PU sample, EVD (zipped) and metadata transferred with PhEDEx to T1/T2 sites that will run digitization/reconstruction: The XML POOL catalog of the PU, with site-dependent PFNs/protocol, is
placed in a standard location
Publish a LCG software tag for the PU in the grid information system, used as a requirement in the JDL of the jobs
At runtime, the job wrapper merges the PU catalog with that of the job
This simple (novel) implementation has been crucial for running digitization and reconstruction jobs in LCG
CHEP06, MumbaiCMS Monte Carlo Production in LCG 14
P. Garcia-Abia / CIEMAT
Other interesting ideas
Local CMS software installation at runtime: We instrumented the job wrapper to install the CMS software in the working
area of the job
Little overhead: software downloaded from the SE
Avoid NFS problems, software installation problems, black holes
Suitable for running in sites with little or no support for CMS
Local pile-up installation at runtime (a la ATLAS): Store and replicate the PU sample in several SEs
Download a (random) fraction of the PU sample
Generate metadata for the PU runs downloaded
An experimental version exists, not used for physics: Important to determine the number of events required to have minimal or no
impact on physics
Need to study tradeoff between local access and downloading of files (LAN)
CHEP06, MumbaiCMS Monte Carlo Production in LCG 15
P. Garcia-Abia / CIEMAT
Production operations Production in LCG started slowly one year ago with reduced manpower:
Development/testing of the McRunjob-LCG software and production operations done for a long time by ~1.5 FTE
Other production operators joined the effort few months later
The number of events (in millions) produced in LCG per data tier are: 13.1 generated, 11.7 simulated, 11.4 digitized, 5.1 reconstructed
Simulation
11/04 to 02/06
12 M
Digitization
06/05 to 02/06
14 M
CHEP06, MumbaiCMS Monte Carlo Production in LCG 16
P. Garcia-Abia / CIEMAT
Production operations II We decided to use white lists due to grid/site unreliability:
Sites selected for their size and robustness Big sites still running production in local farm mode (FZK, IN2P3) Local administrators providing fast response: fix problems Availability of PU
This represents a fraction (~30 %) of the production in LCG No proper bookkeeping in the initial phase of production
4275 simulation jobs9800 digitization and
reconstruction jobs
CHEP06, MumbaiCMS Monte Carlo Production in LCG 17
P. Garcia-Abia / CIEMAT
Efficiency and failures
Rather low efficiency
Stage in/out and catalog (RLS) related problems
LCG and site problems: RB, CE
Input file stage in 25 %
Output file stage out 16 %
Local data access 19 %
LCG catalog lookup (LFC ~ 0 %) 6 %
Other LCG and site problems 25 %
Application failure 8 %
Unclassified 1 %
CHEP06, MumbaiCMS Monte Carlo Production in LCG 18
P. Garcia-Abia / CIEMAT
Example of problems
Lack of automatic monitoring/resubmission
Lack of coupling of the CMS data management system (pre-staging)
Temporary grid and site problems: CE, SE, RLS
Lack of manpower
Organization: lack of dedicated resources (PU)
Lack of priorities: competition with CMS analysis and other experiments’ jobs
job
s
CHEP06, MumbaiCMS Monte Carlo Production in LCG 19
P. Garcia-Abia / CIEMAT
Migration to LFC
Recently, CMS has migrated from RLS to LFC, as a global file catalog for LCG (thanks to S. Lemaitre, A. Sciabà , J. Casey)
We adapted McRunjob to use LFC instead of RLS
So far, a small fraction of production in LCG done using LFC
Very satisfactory results as compared to RLS
CHEP06, MumbaiCMS Monte Carlo Production in LCG 20
P. Garcia-Abia / CIEMAT
Performance of production (LFC)
Significant improvement in performance when using LFC as a global catalog
A bunch of jobs died due to a unscheduled power cut
CHEP06, MumbaiCMS Monte Carlo Production in LCG 21
P. Garcia-Abia / CIEMAT
New MC production system
Expert system (prodagent)
Automatic data merging step
Job chaining
Coupled to the Data Management System
New EDM (no metadata attachment)
Improved monitoring
Better error handling
CHEP06, MumbaiCMS Monte Carlo Production in LCG 22
P. Garcia-Abia / CIEMAT
Conclusions
End-to-end production system in LCG
Invaluable experience for the next generation Monte Carlo production system
Robustness is very a important issue given the current unreliability and instability of grid/sites