Upload
yandex
View
971
Download
0
Embed Size (px)
DESCRIPTION
Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012Marco Cattaneo, CERN
Citation preview
Event Data Processing in LHCb
Marco Cattaneo CERN – LHCb
On behalf of the LHCb Computing Group
LHC interactions
2
LHC: Two proton beams of ~1380 bunches, rotating at 11kHz!!15 MHz crossing rate (30 MHz in 2015)!!Average 1.5 interactions per crossing in LHCb!! !Each crossing contains a potentially interesting “event”!
Typical event in LHCb
3
Raw event size ~60 kB
Data reduction in real time (trigger)
4
HLT
Level-0 ü custom hardware ü partial detector information
ü CPU farm -> software trigger ü full detector information ü reconstruction in real time
ü (~20k jobs in parallel) ü ~200 independent “lines”
Offline Storage
pp collisions
15 MHz
1 MHz
~5 kHz ü 300 MB/s ü 1.3 PB in 2012
Event Reconstruction
❍ On full event sample (~2 billion events expected in 2012): ❏ Pattern recognition to measure particle trajectories
✰ Identify vertices, measure momentum ❏ Particle ID to measure particle types
5
DST event size ~100 kB -> 2 PB in 2012 (includes RAW)
~ 2 sec/event ~ 5k concurrent jobs
(run on grid) Reconstructed data stored in “FULL DST”
Data reduction offline (stripping)
❍ Any given physics analysis selects 0.1-1.0% of events ❏ Inefficient to allow individual physicists to run selection job
over FULL DST ❍ Group all selections in a single selection pass, executed by
central production team ❏ Runs over FULL DST ❏ Executes ~800 independent stripping “lines”
✰ ~ 0.5 sec/event in total ✰ Writes out only events selected by one or more of these lines
❏ Output events are grouped into ~15 streams ❏ Each stream selects 1-10% of events
❍ Overall data volume reduction: x50 – x500 depending on stream ❏ Few TB (<50) per stream, replicated at several places ❏ Accessible to physicists for data analysis
6
Simulation
7
LHCb Monte Carlo simulation software: – Simulation of physics event – Detailed detector and material description (GEANT) – Pattern recognition, trigger simulation and offline event selection – Implements detector inefficiencies, noise hits, effects of multiple collisions
Resources for simulation
❍ Simulation jobs require 1-2 minutes per event ❏ Several billion events per year required ❏ Runs on ~20k CPUs worldwide
8
Yandex contribution ~25%
Physics Applications Software Organization
Frameworks Toolkits
Reco
nstr
ucti
on
Sim
ulat
ion
Ana
lysi
s
Foundation Libraries H
igh
leve
l tri
gger
s
One framework for basic services + various specialized frameworks:
detector description, visualization, persistency, interactivity, simulation, etc.
A series of widely used basic libraries: Boost, GSL, Root etc.
Applications built on top of frameworks and implementing the required algorithms.
Gaudi Framework (Object Diagram)
Converter
Algorithm
Event Data Service
Persistency Service
Data Files
Algorithm Algorithm
Detec. Data Service
Persistency Service
Data Files
Transient Detector
Store
Message Service
JobOptions Service
Particle Prop. Service
Other Services Histogram
Service Persistency
Service Data Files
Transient Histogram
Store
Application Manager Converter Converter Event
Selector
Transient Event Store
The LHCb Computing Model
11
LHCb Workload management System :DIRAC !
❍ DIRAC forms anoverlay network!
❏ A way for gridinteroperability for a given Community!
❏ Needs specific Agent Director per resource type!
❍ From the user perspective all the resources are seen as a single large “batch system”!
Grid A! Grid B!
User Community!
(WLCG)! (NDG)!
DIRAC WMS!
13
u Jobs are submitted to the DIRAC Central Task Queue !
u VRC policies are applied here by prioritizing jobs in the Queue!
u Pilot Jobs are submitted by specific Directors to various Grids or computer clusters!
u Allows to aggregate various computing types resources transparently for the users !
u The Pilot Job gets the most appropriate user job!
u Jobs are running in a verified environment with a high efficiency!
!
DIRAC
❍ Live DIRAC Display
14
Data replication
❍ Active data placement
❍ Split disk (for analysis) and archive ❏ No disk replica for RAW and
FULL DST (read few times)
15
FULL DST!
Brunel!(recons)!
DaVinci!(stripping
and streaming)!
RAW!
DST1!DST1!
DST1! DST1! DST1!
Merge!
DST1!
1 Tier1 (scratch)
CERN + 1 Tier1
1 Tier1
CERN + 1 Tier1
CERN + 3 Tier1s
Datasets
❍ Granularity at the file level ❏ Data Management operations (replicate, remove replica,
delete file) ❏ Workload Management: input/output files of jobs
❍ LHCbDirac perspective ❏ DMS and WMS use Logical File Names to reference files ❏ LFN namespace refers to the origin of the file
✰ Constructed by the jobs (uses production and job number) ✰ Hierarchical namespace for convenience ✰ Used to define file class (tape-sets) for RAW, FULL.DST, DST ✰ GUID used for internal navigation between files (Gaudi)
❍ User perspective ❏ File is part of a dataset (consistent for physics analysis) ❏ Dataset: specific conditions of data, processing version and
processing level ✰ Files in a dataset should be exclusive and consistent in quality
and content
16
Replica Catalog (1)
❍ Logical namespace ❏ Reflects somewhat the origin of the file (run number for
RAW, production number for output files of jobs) ❏ File type also explicit in the directory tree
❍ Storage Elements ❏ Essential component in the DIRAC DMS ❏ Logical SEs: several DIRAC SEs can physically use the same
hardware SE (same instance, same SRM space) ❏ Described in the DIRAC configuration
✰ Protocol, endpoint, port, SAPath, Web Service URL ✰ Allows autonomous construction of the SURL ✰ SURL = srm:<endPoint>:<port><WSUrl><SAPath><LFN>
❏ SRM spaces at Tier1s ✰ Used to have as many SRM spaces as DIRAC SEs, now only 3 ✰ LHCb-Tape (T1D0) custodial storage ✰ LHCb-Disk (T0D1) fast disk access ✰ LHCb-User (T0D1) fast disk access for user data
17
Replica Catalog (2)
❍ Currently using the LFC ❏ Master write service at CERN ❏ Replication using Oracle streams to Tier1s ❏ Read-only instances at CERN and Tier1s
✰ Mostly for redundancy, no need for scaling ❍ LFC information:
❏ Metadata of the file ❏ Replicas
✰ Use “host name” field for the DIRAC SE name ✰ Store SURL of creation for convenience (not used)
❄ Allows lcg-util commands to work
❏ Quality flag ✰ One character comment used to set temporarily a replica as
unavailable ❍ Testing scalability of the DIRAC file catalog
❏ Built-in storage usage capabilities (per directory)
18
Bookkeeping Catalog (1)
❍ User selection criteria ❏ Origin of the data (real or MC, year of reference)
✰ LHCb/Collision12 ❏ Conditions for data taking of simulation (energy, magnetic
field, detector configuration… ✰ Beam4000GeV-VeloClosed-MagDown
❏ Processing Pass is the level of processing (reconstruction, stripping…) including compatibility version ✰ Reco13/Stripping19
❏ Event Type is mostly useful for simulation, single value for real data ✰ 8 digit numeric code (12345678, 90000000)
❏ File Type defines which type of output files the user wants to get for a given processing pass (e.g. which stream) ✰ RAW, SDST, BHADRON.DST (for a streamed file)
❍ Bookkeeping search ❏ Using a path
✰ /<origin>/<conditions>/<processing pass>/<event type>/<file type>!
19
Bookkeeping Catalog (2)
❍ Much more than a dataset catalog! ❍ Full provenance of files and jobs
❏ Files are input of processing steps (“jobs”) that produce files ❏ All files ever created are recorded, each processing step as
well ✰ Full information on the “job” (location, CPU, wall clock time…)
❍ BK relational database ❏ Two main tables: “files” and “jobs” ❏ Jobs belong to a “production” ❏ “Productions” belong to a “processing pass”, with a given
“origin” and “condition” ❏ Highly optimized search for files, as well as summaries
❍ Quality flags ❏ Files are immutable, but can have a mutable quality flag ❏ Files have a flag indicating whether they have a replica or
not
20
Bookkeeping browsing
❍ Allows to save datasets ❏ Filter, selection ❏ Plain list of files ❏ Gaudi configuration file
❍ Can return files with only replica at a given location
21
Event indexing
❍ Book-keeping has no information about individual events
❍ But can be beneficial to select events based on global criteria: ❏ Number of tracks, clusters
etc. ❏ Trigger or Stripping lines
fired ❏ …
❍ Prototype implemented by Andrey Ustyuzhanin
22
104248:539058326
Advanced search
104248:539058326
Event time Oct. 27, 2011, 9:11 p.m.
File names
/lhcb/LHCb/Collision11/DIMUON.DST/00013016/0000/00013016_00000037_1.dimuon.dst /lhcb/LHCb/Collision11/EW.DST/00013017/0000/00013017_00000033_1.ew.dst
Application Brunel v41r1
Tagshead-20110914DDDB
tt-20110126DQFLAGShead-20111111LHCBCOND
HEADONLINE
Stripping lines
1DY2MuMuLine2_Hlt1FullDSTDiMuonDiMuonHighMassLine0StreamDimuon0StreamEW2WMuLine1Z02MuMuLine1Z02MuMuNoPIDsLine2Z02TauTauLine
Global Event Activity counters
37nBackTracks21nDownstreamTracks
352nITClusters11nLongTracks
156nMuonCoordsS086nMuonCoordsS115nMuonCoordsS231nMuonCoordsS318nMuonCoordsS4
9nMuonTracks4615nOTClusters
2nPV944nRich1Hits985nRich2Hits128nSPDhits357nTTClusters
14nTTracks92nTracks
6nUpstreamTracks1389nVeloClusters
56nVeloTracks
Select all Download Selected
Search is supported by
Staging: using files from tape
❍ If jobs use files that are not online (on disk) ❏ Before submitting the job ❏ Stage the file from tape, and pin it on cache
❍ Stager agent ❏ Performs also cache management ❏ Throttle staging requests depending on the cache size and
the amount of pinned data ❏ Requires fine tuning (pinning and cache size)
✰ Caching architecture highly site dependent ✰ No publication of cache sizes (except Castor and StoRM)
❍ Jobs using staged files ❏ Check first the file is still staged
✰ If not reschedule the job ❏ Copies the file locally on the WN whenever possible
✰ Space is released faster ✰ More reliable access for very long jobs (reconstruction) or jobs
using many files (merging)
23
Data Management Outlook
❍ Improvements on staging ❏ Improve the tuning of cache settings
✰ Depends on how caches are used by sites ❏ Pinning/unpinning
✰ Difficult if files are used by more than one job ❍ Popularity
❏ Record dataset usage ✰ Reported by jobs: number of files used in a given dataset ✰ Account number of files used per dataset per day/week
❏ Assess dataset popularity ✰ Relate usage to dataset size
❏ Take decisions on the number of online replicas ✰ Taking into account available space ✰ Taking into account expected need in the coming week
24
Longer term challenges ❍ Computing model evolution
❏ Networking performance allows evolution from static “Tier” model of data access to much more dynamic model
❏ Current processing model of 1 sequential job per file per CPU breaks down in multi-core era due to memory limitations ✰ parallelisation of event processing
❏ Whole node scheduling, virtualisation, clouds ❍ LHCb upgrade in 2018
❏ From computing point of view: ✰ x40 readout rate into HLT farm ✰ x4 event rate to storage (20kHz) ✰ x1.5 event size (100kB/RAW event)
❏ x10 increase in data rates imply ✰ scaling of Dirac WMS and DMS ✰ scaling of data selection catalogs ✰ new ideas for data mining?
❍ Data preservation and open access ❏ Preserve ability to analyse old data many year in the future ❏ Make data available for analysis outside LHCb + general public
25