Upload
cameron-roberts
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
DQM Architecture From Online Perspective
http://cern.ch/[email protected]
EvF wkg 11/10/2006
E. Meschi – CERN PH/CMD
28.02.2007 E.M. - DQM Online View 2
DQM Requirements
1. Primary goal: provide “fast” feedback to shift crew and subsystem experts about the quality of event data being taken
2. Provide global and subsystem-specific “quality flags” for each unit of event data (aka Luminosity Section)
3. Provide a uniform environment and a modular structure for DQM code (DQM code reusability)
4. Provide a common working environment for expert and generic monitoring alike
5. Integrate well into online operations (e.g. core activities started automatically by RunControl)
6. Provide a hierarchical online view of the status of the experiment
7. Provide a uniform look and feel for DQM GUIs
8. Enable seamless integration of offline DQM activities (see 3.)
9. Enable remote DQM shifts
28.02.2007 E.M. - DQM Online View 3
DQM Infrastructure
• DQMServices– Fully integrated with CMSSW– Modularity of user code imposed by framework– Uniform interface for creation/management of DQM objects– Bookkeeping, transport and collation of DQM data– Quality test and status tracking– Web interface toolkit, xdaq integration– Visual client integrated with Iguana– See C.L. presentation
• 80% of the requirements in previous slide are covered– How to get the remaining 20% is one of the subjects of this
workshop.
28.02.2007 E.M. - DQM Online View 4
DQMServices use cases
data
subscriptions
CRATE CONTROLLER PC
COLLECTOR
CONSUMERS
data
subscriptions
Event CONSUMERS
COLLECTOR
DQM CONSUMERS
EVENT SERVER / SM
events
data
subscriptions
directory
COLLECTOR
CONSUMERS
ONLINEONLINE QUASI - ONLINEQUASI - ONLINE
FWK
FWK + XDAQ
STANDALONE
XDAQ/WRAPPED
TCP/TMessage
Event Data
FILTER FARM
CONSUMERS
STORAGE MANAGER
28.02.2007 E.M. - DQM Online View 5
Frequent Questions
• Which network will I be running on ?
• Can I / should I use CMSSW ?
• How is my process going to be started / controlled ?
• Do I get to access OMDS ? ORCON ?
• Do I have access to DCS data ?
• Do I have access to DAQ monitoring data ?
28.02.2007 E.M. - DQM Online View 6
DQM Modes of Operation
• Online at crate controller level– Input rate: limited by VME access (*)– Event Building: No– CPU: crate controller PCs– Bw: consistent with experiment network– Delay: virtually 0
• Online in Filter Farm– Input rate: up to 100 kHz– Event Building: Yes– CPU: 10-0% of HLT CPU– Bw: 5-0% of total bw (1 GB/s)– Delay: 0
• Online in Event Consumer– Input rate: 1-10 Hz aggregate– Event Building: Yes– CPU: subsystem CPUs– Bw: consistent with experiment network– Delay: seconds
EXP. NETWORKCAN USE CMSSWCAN USE RC (SUB-DET)FREE ACCESS TO DBDCS: via PSXDAQmon: via DB
EXP. NETWORKMUST USE CMSSWMUST USE RCLIMITED ACCESS TO DBDCS: NODAQmon: NO
EXP. OR CAMPUS NETWORKMUST USE CMSSWCAN USE RCFREE ACCESS TO DB (EXP)DCS: via PSX or DBDAQmon: via DB
28.02.2007 E.M. - DQM Online View 7
DQM Modes of Operation
• Quasi-online processing local file from SM– Input rate: O(10) Hz aggregate
– Event Building: Yes
– CPU: subsystem CPUs
– Bw: consistent with experiment network
– Delay: minutes
• Offline processing– Input rate: virtually all data stored (O(100Hz))
– Event Building: Yes
– CPU: batch farm
– Bw: consistent with campus network
– Delay: ~ 1 hour
EXP. OR CAMPUS NETWORKMUST USE CMSSWCAN USE RCFREE ACCESS TO DB (EXP)DCS: via DBDAQmon: via DB
GRIDMUST USE CMSSWCANNOT USE RCACCESS TO OFFLINE DB ONLYDCS: indirectly via condDBDAQmon: NO
28.02.2007 E.M. - DQM Online View 8
DQM in the FF
• The one and only way to get 100 % of the events from L1 • Embedding DQM in the HLT has however the following
disadvantages:1. It must be accounted for in the HLT CPU budget2. It affects the robustness of the HLT: DQM code to be run like that is
going to be subject to much stricter requirements and will not be allowed to change frequently
3. DQM data is scattered over many sources: the bandwidth to the collector is limited, and a standard collation operation must be carried out in the collector to reduce data volume.
• It should be reserved for cases whereThe entire L1 accept rate is needed
or Big statistics must be accumulated over a short period (e.g. at the
beginning of a run)
28.02.2007 E.M. - DQM Online View 9
Filter Farm Data Operation
EVENT/DQM SERVER
DATALOGGER
eventdata
EVENT DATABUFFERS
DQMdata
SPECIAL STREAMS BUFFERS
EVENT/DQMPROXY/CACHING
SERVER
DQM SNAPSHOTBUFFERS
EVENT CONSUMERS DQM CONSUMERS
STORAGEMANAGERS
28.02.2007 E.M. - DQM Online View 10
FF DQM Data Handling
• First Level of DQM Collection in Storage Manager– Does collation of many FU copies
• Proxy/Caching Server collects collated updates from all SMs– Does final collation– Saves snapshot per LS– Serves individual consumers– It’s only point of access from outside the experiment network
• Consumers of FF DQM– Can subscribe to individual DQM “folders” – Only have access to collated information– Are responsible for processing DQM information (Qtests, status
variables, presentation etc.)
28.02.2007 E.M. - DQM Online View 11
Other Online Sources of DQM Data
• Event and non-event DQM from crate controllers– Should be part of the sub-detector online configuration (and thus
be controlled by the sub-det FM)– Including collection and collation
• Event Consumers (both using Event Server or disk streams)– Should be controlled by RunControl– Should be grouped in few individual processes by functionality
and input– E.g. all DQM modules that use a zero-bias special stream are run
by the same process
• One or multiple collectors• Collation in case of multiple identical sources is delegated
to client
28.02.2007 E.M. - DQM Online View 12
DQM Clients• Two types of consumers of DQM information
– Intelligent clients (Superclients)• Do data manipulation• Are themselves producers of DQM data• Can act as servers• Can write into CondDB• Can (but do not necessarily) provide graphical feedback• Can (but do not necessarily) provide interactive control (e.g. switch to expert
mode…)• Should be xdaq applications so they can be best controlled by RunControl• Can be FW applications to gain access to FW services (e.g. ORCON)
• See S.B. talk
• Can run unattended and provide feedback to operator via warning/error messages
– Dumb clients (e.g. GUI)• Do not add information or manipulate data• Cannot act as servers• Cannot write in CondDB• Provide interactive feedback
28.02.2007 E.M. - DQM Online View 13
Client Operation• DQM is controlled as a separate sub-system of
DAQ (excluding DQM in FF)– Sources (event consumers)– Collectors– Intelligent clients
• If full state machine binding for xdaq applications (e.g. derived from DQMBaseClient)
– Get configure, run start/stop commands• Otherwise limited to start/stop of processes if
no xdaq binding• As a minimum gives a report line to know if a
process is alive• Control is on a “best-effort” base, I.e. DAQ will
not stop if a DQM component crashes• Each Superclient must provide a non-graphic
synoptic view of the status of the sub-system it monitors
• Key plots (used in the status calculation) are stored in a snapshot (at every LS)
• Plus a navigable hierarchy of status information based on the folder organization (e.g. one folder per chamber: status calculated based on status of contained histograms, etc.)
TOP
DAQ
DQM
FF
Subsystem
CRATE CONTROLLER
DQM SOURCEs
EVENTCONSUMERS
COLLECTORS
SUPERCLIENTS
HLT as DQM SOURCE
SM as DQM COLLECTOR
CLIENT CONCENTRATOR
GLOBAL STATUS DISPLAY
28.02.2007 E.M. - DQM Online View 14
Organization of Online DQM• Hardware
– Online DQM PCs must be connected to the experiment network– They are in general a responsibility of the sub-detector– System management is carried out centrally by DAQ team– Disk space for monitor streams and DQM snapshots is managed centrally (as
part of the Storage Manager complex)
• Software– XDAQ and CMSSW central installations are provided– Sub-systems can derive project trees for fast development– NO flexibility for code running on the filter farm– SOME flexibility for code to run in “quasi-online” mode (compatible with
centralized configuration/control)– Freedom for applications under sub-system responsibility (e.g. DQM in crate
controller under sub-detector FM control)
• DB– Database access by individual DQM processes MUST happen via one of the
approved mechanisms (Tstore for OMDS and POOL-ORA for ORCON)– Database access bandwidth for DQM MUST be negotiated with the DB group– General rule of thumb is NO DEADTIME due to db stuck on dqm access
28.02.2007 E.M. - DQM Online View 15
Summary
• Existing infrastructure covers 80% of DQM requirements• Standardization of DQM data generation is achieved (using
DQMServices/FW components)• Standardization of “SuperClients” must be achieved
– Enforce hierarchy of views– Enforce use of quality test and status tools– Enforce use of standard entrypoints for data/status manipulation– Define policies for combining status information
• Standardization of control– Use Run control to drive DQM processes– DQM becomes a “subsystem”– Line of reporting for critical errors
• Standardization of look and feel– GUI: development needed for production-level use– Color codes, etc.