15
DQM Architecture From Online Perspective http: //cern . ch/cmsevf [email protected] EvF wkg 11/10/2006 E. Meschi – CERN PH/CMD

DQM Architecture From Online Perspective [email protected] EvF wkg 11/10/2006 E. Meschi – CERN PH/CMD

Embed Size (px)

Citation preview

DQM Architecture From Online Perspective

http://cern.ch/[email protected]

EvF wkg 11/10/2006

E. Meschi – CERN PH/CMD

28.02.2007 E.M. - DQM Online View 2

DQM Requirements

1. Primary goal: provide “fast” feedback to shift crew and subsystem experts about the quality of event data being taken

2. Provide global and subsystem-specific “quality flags” for each unit of event data (aka Luminosity Section)

3. Provide a uniform environment and a modular structure for DQM code (DQM code reusability)

4. Provide a common working environment for expert and generic monitoring alike

5. Integrate well into online operations (e.g. core activities started automatically by RunControl)

6. Provide a hierarchical online view of the status of the experiment

7. Provide a uniform look and feel for DQM GUIs

8. Enable seamless integration of offline DQM activities (see 3.)

9. Enable remote DQM shifts

28.02.2007 E.M. - DQM Online View 3

DQM Infrastructure

• DQMServices– Fully integrated with CMSSW– Modularity of user code imposed by framework– Uniform interface for creation/management of DQM objects– Bookkeeping, transport and collation of DQM data– Quality test and status tracking– Web interface toolkit, xdaq integration– Visual client integrated with Iguana– See C.L. presentation

• 80% of the requirements in previous slide are covered– How to get the remaining 20% is one of the subjects of this

workshop.

28.02.2007 E.M. - DQM Online View 4

DQMServices use cases

data

subscriptions

CRATE CONTROLLER PC

COLLECTOR

CONSUMERS

data

subscriptions

Event CONSUMERS

COLLECTOR

DQM CONSUMERS

EVENT SERVER / SM

events

data

subscriptions

directory

COLLECTOR

CONSUMERS

ONLINEONLINE QUASI - ONLINEQUASI - ONLINE

FWK

FWK + XDAQ

STANDALONE

XDAQ/WRAPPED

TCP/TMessage

Event Data

FILTER FARM

CONSUMERS

STORAGE MANAGER

28.02.2007 E.M. - DQM Online View 5

Frequent Questions

• Which network will I be running on ?

• Can I / should I use CMSSW ?

• How is my process going to be started / controlled ?

• Do I get to access OMDS ? ORCON ?

• Do I have access to DCS data ?

• Do I have access to DAQ monitoring data ?

28.02.2007 E.M. - DQM Online View 6

DQM Modes of Operation

• Online at crate controller level– Input rate: limited by VME access (*)– Event Building: No– CPU: crate controller PCs– Bw: consistent with experiment network– Delay: virtually 0

• Online in Filter Farm– Input rate: up to 100 kHz– Event Building: Yes– CPU: 10-0% of HLT CPU– Bw: 5-0% of total bw (1 GB/s)– Delay: 0

• Online in Event Consumer– Input rate: 1-10 Hz aggregate– Event Building: Yes– CPU: subsystem CPUs– Bw: consistent with experiment network– Delay: seconds

EXP. NETWORKCAN USE CMSSWCAN USE RC (SUB-DET)FREE ACCESS TO DBDCS: via PSXDAQmon: via DB

EXP. NETWORKMUST USE CMSSWMUST USE RCLIMITED ACCESS TO DBDCS: NODAQmon: NO

EXP. OR CAMPUS NETWORKMUST USE CMSSWCAN USE RCFREE ACCESS TO DB (EXP)DCS: via PSX or DBDAQmon: via DB

28.02.2007 E.M. - DQM Online View 7

DQM Modes of Operation

• Quasi-online processing local file from SM– Input rate: O(10) Hz aggregate

– Event Building: Yes

– CPU: subsystem CPUs

– Bw: consistent with experiment network

– Delay: minutes

• Offline processing– Input rate: virtually all data stored (O(100Hz))

– Event Building: Yes

– CPU: batch farm

– Bw: consistent with campus network

– Delay: ~ 1 hour

EXP. OR CAMPUS NETWORKMUST USE CMSSWCAN USE RCFREE ACCESS TO DB (EXP)DCS: via DBDAQmon: via DB

GRIDMUST USE CMSSWCANNOT USE RCACCESS TO OFFLINE DB ONLYDCS: indirectly via condDBDAQmon: NO

28.02.2007 E.M. - DQM Online View 8

DQM in the FF

• The one and only way to get 100 % of the events from L1 • Embedding DQM in the HLT has however the following

disadvantages:1. It must be accounted for in the HLT CPU budget2. It affects the robustness of the HLT: DQM code to be run like that is

going to be subject to much stricter requirements and will not be allowed to change frequently

3. DQM data is scattered over many sources: the bandwidth to the collector is limited, and a standard collation operation must be carried out in the collector to reduce data volume.

• It should be reserved for cases whereThe entire L1 accept rate is needed

or Big statistics must be accumulated over a short period (e.g. at the

beginning of a run)

28.02.2007 E.M. - DQM Online View 9

Filter Farm Data Operation

EVENT/DQM SERVER

DATALOGGER

eventdata

EVENT DATABUFFERS

DQMdata

SPECIAL STREAMS BUFFERS

EVENT/DQMPROXY/CACHING

SERVER

DQM SNAPSHOTBUFFERS

EVENT CONSUMERS DQM CONSUMERS

STORAGEMANAGERS

28.02.2007 E.M. - DQM Online View 10

FF DQM Data Handling

• First Level of DQM Collection in Storage Manager– Does collation of many FU copies

• Proxy/Caching Server collects collated updates from all SMs– Does final collation– Saves snapshot per LS– Serves individual consumers– It’s only point of access from outside the experiment network

• Consumers of FF DQM– Can subscribe to individual DQM “folders” – Only have access to collated information– Are responsible for processing DQM information (Qtests, status

variables, presentation etc.)

28.02.2007 E.M. - DQM Online View 11

Other Online Sources of DQM Data

• Event and non-event DQM from crate controllers– Should be part of the sub-detector online configuration (and thus

be controlled by the sub-det FM)– Including collection and collation

• Event Consumers (both using Event Server or disk streams)– Should be controlled by RunControl– Should be grouped in few individual processes by functionality

and input– E.g. all DQM modules that use a zero-bias special stream are run

by the same process

• One or multiple collectors• Collation in case of multiple identical sources is delegated

to client

28.02.2007 E.M. - DQM Online View 12

DQM Clients• Two types of consumers of DQM information

– Intelligent clients (Superclients)• Do data manipulation• Are themselves producers of DQM data• Can act as servers• Can write into CondDB• Can (but do not necessarily) provide graphical feedback• Can (but do not necessarily) provide interactive control (e.g. switch to expert

mode…)• Should be xdaq applications so they can be best controlled by RunControl• Can be FW applications to gain access to FW services (e.g. ORCON)

• See S.B. talk

• Can run unattended and provide feedback to operator via warning/error messages

– Dumb clients (e.g. GUI)• Do not add information or manipulate data• Cannot act as servers• Cannot write in CondDB• Provide interactive feedback

28.02.2007 E.M. - DQM Online View 13

Client Operation• DQM is controlled as a separate sub-system of

DAQ (excluding DQM in FF)– Sources (event consumers)– Collectors– Intelligent clients

• If full state machine binding for xdaq applications (e.g. derived from DQMBaseClient)

– Get configure, run start/stop commands• Otherwise limited to start/stop of processes if

no xdaq binding• As a minimum gives a report line to know if a

process is alive• Control is on a “best-effort” base, I.e. DAQ will

not stop if a DQM component crashes• Each Superclient must provide a non-graphic

synoptic view of the status of the sub-system it monitors

• Key plots (used in the status calculation) are stored in a snapshot (at every LS)

• Plus a navigable hierarchy of status information based on the folder organization (e.g. one folder per chamber: status calculated based on status of contained histograms, etc.)

TOP

DAQ

DQM

FF

Subsystem

CRATE CONTROLLER

DQM SOURCEs

EVENTCONSUMERS

COLLECTORS

SUPERCLIENTS

HLT as DQM SOURCE

SM as DQM COLLECTOR

CLIENT CONCENTRATOR

GLOBAL STATUS DISPLAY

28.02.2007 E.M. - DQM Online View 14

Organization of Online DQM• Hardware

– Online DQM PCs must be connected to the experiment network– They are in general a responsibility of the sub-detector– System management is carried out centrally by DAQ team– Disk space for monitor streams and DQM snapshots is managed centrally (as

part of the Storage Manager complex)

• Software– XDAQ and CMSSW central installations are provided– Sub-systems can derive project trees for fast development– NO flexibility for code running on the filter farm– SOME flexibility for code to run in “quasi-online” mode (compatible with

centralized configuration/control)– Freedom for applications under sub-system responsibility (e.g. DQM in crate

controller under sub-detector FM control)

• DB– Database access by individual DQM processes MUST happen via one of the

approved mechanisms (Tstore for OMDS and POOL-ORA for ORCON)– Database access bandwidth for DQM MUST be negotiated with the DB group– General rule of thumb is NO DEADTIME due to db stuck on dqm access

28.02.2007 E.M. - DQM Online View 15

Summary

• Existing infrastructure covers 80% of DQM requirements• Standardization of DQM data generation is achieved (using

DQMServices/FW components)• Standardization of “SuperClients” must be achieved

– Enforce hierarchy of views– Enforce use of quality test and status tools– Enforce use of standard entrypoints for data/status manipulation– Define policies for combining status information

• Standardization of control– Use Run control to drive DQM processes– DQM becomes a “subsystem”– Line of reporting for critical errors

• Standardization of look and feel– GUI: development needed for production-level use– Color codes, etc.