36
Central DQM Shift Tutorial Online/Offline

Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Embed Size (px)

Citation preview

Page 1: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Central DQM Shift Tutorial

Online/Offline

Page 2: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Overview of the CMS DAQ and useful terminology

2

• Detector signals are collected through individual data acquisition systems (cables and boards) that end up at the FEDs: the first element of Global Data Acquisition system (DAQ)

• FED (detector FrontEnd boards): multiple FEDs per detector collect event fragments that are sent to the online event processing farm

• Builder Units: Computing farm that collects event fragments from all FEDs and merge them to produce full event information

• Filter Units: Computing farm where the High Level Trigger (HLT) is run to filter interesting events

• Storage Manager: application that saves to local disks events selected by the HLT

• Detector signals are collected through individual data acquisition systems (cables and boards) that end up at the FEDs: the first element of Global Data Acquisition system (DAQ)

• FED (detector FrontEnd boards): multiple FEDs per detector collect event fragments that are sent to the online event processing farm

• Builder Units: Computing farm that collects event fragments from all FEDs and merge them to produce full event information

• Filter Units: Computing farm where the High Level Trigger (HLT) is run to filter interesting events

• Storage Manager: application that saves to local disks events selected by the HLT

Storage Manager

Page 3: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Online DQM

Online DQM: suite of CMSSW applications that run either on all events in the Filter Farm or on a selection of events served by the Storage manager

Since Dec2009 Online DQM consume DCS information in addition to Event data

Online DQM Infrastructure is completed by the DQM data transfer system up to the DQM servers where the histograms and other DQM data are uploaded and visible to the shifters and the CMS community

Scope of Online DQM Shifts: •Identify problems with detector performance or data integrity during the run

3

Storage Manager

Page 4: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Offline Data Processing and Offline DQM

Online environment T0, CAF

Prompt Reconstruction at T0 and CAF is performed from within one hour up to 48 hours after data is transferred from P5 to T0 and CAF (CERN)

T1

Subsequent iterations of re-reconstructionat the T1’s follow periodically the Prompt Reco with improved Alignment and Calibration constants, bug fixes.

Offline DQM is part of the Offline data processing that, in addition to detector data analyses, includes higher level reconstruction objects, aka Physics Objects (POG’s)

Scope of Offline DQM Shifts: Produce the data certification for various reconstruction iterations USED FOR CMS OFFICIAL GOOD RUN LISTS!!!

4

Page 5: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

• Online Shifts at P5 (3/day for 24 hours coverage)

23:00-7:00 | 7:00-15:00 | 15:00-23:00On the first day of global running, shifts start at 9:00 (for this shift you can use regular 8:15

shuttle)• For latest run plan updates (shift cancelations), inscribe to:

[email protected]• Offline Shifts run at Control Rooms away from P5(4/day)• 1:00 – 7:00 at Fermilab• 7:00 – 13:00 at CERN-CMS Centre (Meyrin site)• 13:00 – 19:00 at DESY• 19:00 – 1:00 at Fermilab

In order to certify reprocessing data,Offline DQM shifts can also be scheduled outside of global data taking periods

DQM Shifts - Overview

5

Page 6: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

6

Access to Control Rooms on CERN Site

• Safety requirements for P5:– Online CMS level 4C safety class must be passed.– Each shifter should have appropriate access rights: through EDH: • request access to « CMS CR » for P5• request access to « CMS CEN » for CMS Center at Meyrin

• Regular shuttle service which runs 7 days per week. - https://twiki.cern.ch/twiki/bin/view/CMS/P5Shuttle

Page 7: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

7

In preparation for the shift

• Read the DQM shift instructions, even if you have been on shift before:https://twiki.cern.ch/twiki/bin/view/CMS/DQMShiftInstructions

https://twiki.cern.ch/twiki/bin/view/CMS/OnlineDQMShiftshttps://twiki.cern.ch/twiki/bin/view/CMS/OfflineDQMShifts https://twiki.cern.ch/twiki/bin/view/CMS/DQMOnlineShortTermInstr https://twiki.cern.ch/twiki/bin/view/CMS/DQMOfflineShortTermInstr

Check the instructions at the beginning of each shift to find out if something has changed since your last shift

Page 8: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

8

In preparation for the shift

• For newcomers:• Schedule your first shift in the daytime so assistance will be readily

available if needed• Attend the shift tutorial on Monday (possibly the latest one before your shift

or the one before that)• Attend a trainee shift between the tutorial and your first shift

– Arranged by DQM shift managers and you will the receive mail in time

Page 9: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

DQM Shift Tools

• DQM GUI: Graphical User Interface, for histogram viewing

• DQM Run Registry: web interface to the Database that holds run information. Used by shifters to register interesting runs (Online shifters) and to collect quality information

•Elog: for end of shifts and problem reports

•TWiki pages: for shift instructions

9

Page 10: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

10

Task 0: Applications• (1) Make sure DQM applications and GUI are running during data taking:

– Online: check update of histograms during runs– Offline: check arrival of histograms from Tier-0/CAF processing

• (2) Check correct updating of Run Registry• (3) In case of problems in DQM Tools (persisting longer than 15 mins), call:

• (4) During Online shifts stay in close contact with the Shift Leader (P5)

DQM on-call Expert :165579

Page 11: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Task 1: Histogram Inspection• Follow online/offline shift instructions• Run-by-run procedure:– (1) Enter significant runs in the Run Registry

– Online Shift : decide on significant runs (cosmics/Commissioning) to register in the Run Registry (confirm with the Shift Leader!)

• ALL Collision runs should be registered (even if for 1 LS)– Offline Shift: analyze runs previously registered during online shift

– (2) Shift histogram Inspection• Look at the Summary, the Reports, and Shift Workspace in the DQM GUI• Make an effort to look at all the plots one by one.• If you spot a problem or have a question regarding a specific plot

• Check the sub-system instruction and take action(s) accordingly• If not explained in the instruction, discuss with P5 shift leader and if needed inform him to call Detector On Call (DOC)• In offline, inform sub-system DQM experts, as noted in the instruction page by mail or phone– Make Elog entry (Type : “Problem Report”)

11

Page 12: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

GUI – Summary Workspace

12

https://cmsweb.cern.ch/dqm/online

Page 13: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

13

GUI – Summary Workspacehttps://cmsweb.cern.ch/dqm/online

Page 14: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

14

GUI – Reports Workspace

https://cmsweb.cern.ch/dqm/online

Page 15: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

15

GUI – Shift Workspace

Page 16: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

16

GUI – Shift Workspace

Inspect Error plots (Online GUI -> Work Space -> Shift -> ErrorsThey show plots that indicate errors.Please read sub-system short term instruction about the evaluation Keep this page and the summary page always open

throughout your shift at P5

Page 17: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

17

DQM Shift Instructions

Page 18: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

18

DQM Shift Instructions• Check HV information on the DQM GUI summary ( "Info" histogram) and the Run Registry ( "LumiSec" table) are consistent. Sometimes "LS" in the Runs table shows 0. If this happens:•make an ELOG entry "Problem report" indicating the run number•ensure that the DQM expert on-call is aware of the problem.•put the information manually into the general comments section of the Run Registry. format: Example: LS 0 = CASTOR, Strips, RPC, Pixel, and DT with HV OFF. All others with HV ON.

When stable beam is declared, all the subdetector should be switched on including Pixel and Strip If the HV is still off, ask shift leader or DCS shifter

Correlation between Data Cert and HV conditions:If HV remains OFF throughout the entire run corresponding sub-system would appear as Yellow (no need to check histograms if not specifically mentioned in documentation) If HV is ON in at least one LS, follow usual procedure to certify

Page 19: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

19

Task 2: Online Run Registry

Hover over the quality flag to see the comment

The Online Run Registry can be found here:

https://cmswbm2.web.cern.ch/cmswbm2/runregistry/

You need your NICE password to log in!

Run Registry collects automated run summary and information filled by the DQM shifters: Data quality and Comments.

There are two tables in the Online Run Registry:- Run Summary- Selected Runs

Page 20: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

•The Run Summary table contains information that is automatically entered

• All Runs are entered in this table, from short start/stop runs to stable long runs

• This table is used by the ONLINE DQM SHIFTER to select Significant runs:

– Register all the collisions runs!– For commissioning or cosmic runs, register only runs that have more than 10,000 events and/or have been running for more than 10 minutes (if you are in doubt, ask the shift leader). – Follow the run-by-run workflow inspecting the shift histograms.

Ask the shift leader for confirmation on the info reported and then move the entry to « SIGNOFF »

Online Run Registry: Run Summary Table

20

Page 21: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

22

Register Runs in Run Registry

• During the run:Click Run to select it Manage Edit

• Based on the shift histogram instructions, set the online subsystem flags (GOOD/BAD) and enter comments

• If a subsystem is BAD, inform the shift leader and the subsystem expert, and enter comment

• Try to provide complete information by adding info like Beam status, etc. in the “comments” field.

Online

Page 22: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Online Run Registry: Editing

Reference Panel Help Panel

Run class

Stop reason

General commentComponents

Component information:-Status (restricted list)-Cause (restricted list)-Comment (free text)

Page 23: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

23

• After the run:• Enter the ‘stop reason’ (in the stop reason field, NOT under comments)• The certification results must be confirmed by the shift leader, before the status of a given run is

moved to « SIGNOFF »• Click Run to select it Move to SIGNOFF

• Once the run is in SIGNOFF state, it cannot be modified by the Online shifter.

Register Runs in Run Registry

Page 24: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Online DQM Shifter: Run Classification

24

Note that Runs are classified though the Group name in the RunRegistry:

Assigning correct Run class is of vital importance as it will affect Offline determination of Runs to be used for different analyses

Current rules follow, make sure you check updated shift instructions at start of your Online DQM

TextText

1. "Collisions" if the run is taken for physics analysis purposes and contains at least one lumi section with two stable beams (colliding or non-colliding).

2."Cosmics" if the run is taken for analysis purposes with cosmics trigger with atleast one muon system + Tracker in DAQ and there is no beamactivity throughout the run, i.e. stable "no beam" conditions

3."Commissioning" for all other runs, i.e. those taken for tests or specific detector studies only, i.e. not meant for general offline physics analysis. Runs are classified automatically by RR. In case of doubt as the shift leader

Page 25: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Prompt Reconstruction GUI• For collision runs Sub-systems use different promary datasets

– For cosmic runs Cosmic Prompt dataset used for all sub-systems

• To ease pp certification workflow a dedicated application (a layer on top of the GUI) is prepared – Default GUI is used for cosmic certification

Subsystem switches to select proper Pds for each of them

Page 26: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

26

Task 2: Offline Run RegistryThe Offline Run Registry can be found here:

https://cmswbmoffshift.web.cern.ch/cmswbmoffshift/runregistry_offline/

You need your NICE password to log in!

Run Registry collects information filled by the DQM shifters: Data quality and Comments.

There are two tables in the Offline Run Registry:- Waiting List- (Offline) Datasets

Page 27: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Offline Run Registry: Shifter’s use

27

Refresh Offline Datasets table (or add automatic refresh) and wait for the dataset in OPEN state to appear

• Edit the dataset and proceed with subsystem evaluation based on relevant histograms in the GUI, assign quality flags

• Move the dataset entry to SIGNOFF when all subsystems are analyzed

Page 28: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

28

ELOGhttp://cmsonline.cern.ch/portal/page/portal/CMS%20online%20system/Elog

• Log in with your AFS account– Click on "Elog" and choose Subsystems -> "Event Display and DQM"

• Problem Report ( 1 entry per problem )– For each problem arising during your shift make a "Problem Report" entry– Please use “Elog” to report problems!

• Shift Summary ( 1 entry per shift )– Summarize the run numbers you checked during your shift– During collision runs check histograms every hours and make an entry in this elog

• N.B. (!!!):– Use Types "Problem Report" or "Shift Summary" only, do NOT create new – No need to create elogs for each run. Make sure run-by-run information entered in the Run Registry

correctly

Page 29: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Run 191830

ECAL barrel Trigger Primitive Digi and RecHit Occupancy

DQM Shift crew took 2hours to report

Page 30: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Run 191692

We lost 70pb-1 of data due to problem in fill 2536

Page 31: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Run 192112

Software issue, no fortunately no data loss

DQM shift crew did not notice it for a few days

Page 32: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

TOTEM test fill 2783

There was a problem in the HLT menu and DQM applications did not get events from Storage manager

DQM shift crew did not notice that all histograms were missing. He realised ONLY when sub-system people enquired about this ......

Page 33: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Commissioning Runs

DQM shift crew did not notice that all SiStrip/Tracking histograms were missing for about 18 hours during 13/9 to 14/9.

We realized ourselves and informed the shift crew !!

It was only commissioning runs so no loss of data but we were blind about the SiStrip detector

Software issue and got fixed quickly

Page 34: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

33

Shift Hand-over• Make sure to arrive 5-10 minutes early for shift hand-over. • Upon your arrival in the control room, the previous shifter will be there– Get from her/him the information about the current status of the data taking and what

happened during the previous shift. – The shift person will show you where the tools are running, which you will be using (DQM

GUI, CMS Online page, Run Registry).– Make sure you have logged in to all the tools (ELOG, RR) as yourself!– If anything with your tasks is not clear, ask at that moment!– At the end of your shift, wait for the next shift person to arrive and provide the same

support.

Page 35: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

34

Links• Shift instructions:– https://twiki.cern.ch/twiki/bin/view/CMS/DQMShiftInstructions – https://twiki.cern.ch/twiki/bin/view/CMS/OnlineDQMShifts – https://twiki.cern.ch/twiki/bin/view/CMS/OfflineDQMShifts – https://twiki.cern.ch/twiki/bin/view/CMS/DQMShiftHistograms

• DQM Online GUI: – http://cmsweb.cern.ch/dqm/online/ – http://cmsweb.cern.ch/dqm/offline/ – http://cmspromptcertification.web.cern.ch/CMSPromptCErtification/– Follow certificate instructions at https://twiki.cern.ch/twiki/bin/view/CMS/DQMGUIGridCertificate

• Run Registry pages: – OnlineShift Usage : https://cmswbm2.web.cern.ch/cmswbm2/runregistry/

– Offline Shift Usage : https://cmswbmoffshift.web.cern.ch/cmswbmoffshift/runregistry_offline/

– CMS Users: https://cmswbmoff.web.cern.ch/cmswbmoff/runregistry_user/

– Elog: – http://cmsonline.cern.ch/portal/page/portal/CMS%20online%20system

Page 36: Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition

Final SuggestionsFrom now to your shift:- Get familiar with the DQM TWiki pages (general structure & dedicated On/Offline): https://twiki.cern.ch/twiki/bin/view/CMS/DQMShiftInstructions

Shortly before each shift you MUST read :- Short term instructions - Commissioning and EvFDQM Hypernews Announcements

https://hypernews.cern.ch/HyperNews/CMS/get/commissioning.html https://hypernews.cern.ch/HyperNews/CMS/get/EvFDqmAnnounce.html

- Online/Offline Shift Histograms by Subsystem <---- especially Online shifters- Read the Elog of the shift before yours to be aware of the recent activity (please read both Shift Leader Elog and Event Display and DQM Elog)

35