22
F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b) , D.Fabiani a) , D.Hirschbuehl c) , K.Ikado b) , T.Kubo d) , Y.Kusakabe b) , K.Maeshima e) , J.Naganoma b) , K. Nakamura d) , C.Plager e) , E.Schmidt f) , H.Stadie c) , R.Tsychiya b) , G.Veramendi g) , W.Wagner c) , H.Wenzel c) , M.Worcester f) , K.Yorita h) a) I.N.F.N. Pisa, b) Waseda University, Tokyo, c) Universitaet Karlsruhe, d) K.E.K, Tsukuba e) Univ. of California, LA, f) Fermilab, g) L.B.L, Berkeley, h) Chicago University ACAT07, NIKHEF, Amsterdam, April 23-27, 2007 F. SCURI a) for the CDF online monitoring group:

F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

Embed Size (px)

Citation preview

Page 1: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 1

Online Monitoring for the CDF Run II Experiment

and the remote operation facilities

T.Arisawab), D.Fabiania), D.Hirschbuehlc), K.Ikadob), T.Kubod), Y.Kusakabeb), K.Maeshimae), J.Naganomab), K. Nakamurad), C.Plagere), E.Schmidtf), H.Stadiec),R.Tsychiyab), G.Veramendig), W.Wagnerc), H.Wenzelc), M.Worcesterf), K.Yoritah)

a) I.N.F.N. Pisa, b) Waseda University, Tokyo, c) Universitaet Karlsruhe, d) K.E.K, Tsukuba

e) Univ. of California, LA, f) Fermilab, g) L.B.L, Berkeley, h) Chicago University

ACAT07, NIKHEF, Amsterdam, April 23-27, 2007

F. SCURIa) for the CDF online monitoring group:

Page 2: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 2

CDF is one of the multipurpose experiments running at the Tevatron (Fermilab)

Main injectorMain injectorand recyclerand recycler

TevatronTevatronBoosterBooster

CDF DØ

p source

Proton-antiprotoncollisions at 1.96 GeV

center of mass energy

Peak luminosity nowabove 2 x 1032 cm2s-1

Page 3: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 3

The CDF (Compact Detector at Fnal) detector

Central calorimeters

End-plug calorimeters

Forward muon

Central muon

Silicontracker

Driftchamber

Solenoid

T.o.f.

A typical architecture for high energy experiments:

a tenth of sub-detectors including:

- inner tracking systems and T.o.F. - central superconducting solenoid (0.5 T) - outer calorimeters and muon detectors - 3-level trigger system - multi-step data flow chain: L3 trigger (250 duals),

Robotic tape storage Production farm (150 duals), distributed Central Analysis Facilities (dCAFs) (9) - a 20 PC cluster for the online systems: DAQ, detector calibration and operation monitor, event display, web servers

Detector and data flow complexity similar to the LHC experiments:

- 0.75 Million channels to be operated and monitored

- 75 Hz L3 output at 20 MB/s (system limit) ~ 0.5 PB/year of data throughput

Page 4: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 4

• Monitor the DAQ/trigger/detector performance without interfering with the data taking.

• Monitored results must be interpreted by the shift crew fast and clear in order to maintain high quality data taking.

• Different consumer processes can run on different machines (expandability).

• Each consumer receives only the data it needs (choice of triggers)

• The monitoring and the display processes are separated. The number of displays is limited only by network traffic and bandwidth.

• Different consumers can be combined to one executable.

• The CDF SW environment is the interface common to all detector partitions easy maintainability.

• Grant save remote access to the monitoring access to experts and remote shifters

DESIGN CHARACTERISTICS

Page 5: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

• Consumers/Monitors:

1) are AnalysisControl++ (C++) modules. - AC++ is the CDF offline analysis framework, handles dataflow between modules - AC++ serves input, output, event generation, detector simulation, event reconstr., database access, physics analyses; ROOT is used for event I/O and analysis tools - several updated version released in a year, the online modules must follow the SW evolution 2) typical monitored objects: detector occupancies, trigger rates, luminosity, vertex pos., physics objects, L3 reconstruction, calibration results….. 3) provide template programs with utility methods to access data and to produce histograms without interfering with the data acquisition; standard procedures are defined to compare actual and reference plots and to provide instructions to the operators. 4) each specific consumer/monitor is written according to the sub-detector specs, within the CDF SW framework

1) + 2) + 3) + 4) just one person in shift (Consumer Operator) can handle full detector and data flow monitoring; sub-detector experts are paged only in case of major troubles

An example of a fast changing monitor: TrigMon (trigger monitor); trigger algorithms and tables change with the Tevatron increasing luminosity at fixed bandwidth.

• Display Server: is a program allowing a Display Browser to access as a client to the consumer output.

• Display Browser: is a program passing data to the user end node; the D.B. can run on the same machine as the D.S. or on different machine either via a socket connection or via WEB through a plug-in for the APACHE web-server (ROOT compatibility)

Something more about the CDF online key words and key points….

Page 6: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 6

L3 Trigger

Push Protocol

Consumer Server

Consu

mer 1

Consu

mer 2

Consu

mer 3

Consu

mer N

Serv

er

Serv

er

Serv

er

Serv

er

display display display display

Config. Via “Talk to”

Error Receiver

Run Control

Serious errors

www www

State Monitor

Online monitoring raw architecture

Page 7: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 7

EVENTS

CONSUMERS

Consumer 1Consumer 2

SERVER

TRIGGER TABLESTEMPLATESDATABASE

(oracle management system)

SOCKET

DISPLAY

Fast socket connection

FILES

HTTP

SERVER

APACHE

DISPLAY

WWW

Element communication layout:2 ways to display the consumer results

Input through web based oracle forms,oracle reports and automated cron jobs

Page 8: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 8

Examples of Monitor Display

Occupancy Monitor

Event Display

• All monitor displays are updated each seconds• Very useful to promptly detect new dead regions• Histograms available on web after update requests

NEW! Snapshot pictures (30 s update) available via WEB

Page 9: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 9

3 people crew for the CDF data taking shifts

SciCo(Scientific Coordinator)

AcceleratorOperators

Detector on-callExperts

CO(Consumer Operator)

ACE (DAQ Run Control and DAQ monitor)

The main job for Co’s is toonline check the detectoroperation qualitya) by looking at the:- Consumer Monitor outputs- Calibration results- Event Displayb) by submitting the resultof a detailed check-list

Page 10: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 10

Motivations and project framework for remote monitoring shifts

( CDF CO (Consumer Operator) shifts )

• Avoid oversea missions just for monitoring shifts (8 shift days + + RadWork training + travel time) save money to allow more people to be onsite for other activities on Physics, Detector upgrades, Operations duties.

• Reduce the number of people on shift during the night at Fnal

• Acquire know-how for future applications of remote shifts (LHC, ILC,…)

• Tsukuba and Pisa were chosen as pilot Institutions for the CDF Remote CO shift project

• Project developed in cooperation with CMS-Fnal, setting-up a multipurpose remote control room in the Wilson Hall at Fnal

Page 11: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 11

Remote shift development project integration

CDF{Tsukuba Fnal

Pisa Fnal CMS CERN Fnal

Page 12: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

The CO duties (full list and original access modes)

Duty Operated via Remote allowed

Start/stop Consumers Commands fromand their Monitorsany online cluster node NOa)

Check detector plots andfill a run check list every WEB YES2 hours

Fill e-log with comments WEB YESand post significant plots

Execute the procedure to Command sequencecheck several x-sections from any online node NOa)

(XMon)

Start the Calibration Commands from anyMonitor patch (DBANA) online cluster node NOa)

Check calibrations WEB YES

Keep under eye the Running on devoted NOa)

monitor display online nodes a) Due to the DOE/Fnal policy, allowing only certified online group membersto remotely connect to the online machines (not the case for regular CO shifter)

Page 13: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

SW developments to allow full operational remote and local CO shifts in almost identical modes

1) Start/stop consumers Use a custom GUI (ConCon) tested and fully operating;and their Monitors (Consumer Control), built on ROOT switch from local toSwitch between physics classes, with action buttons and remote operation modeand cosmic modes and process status windows (details handled in CDF control

below). ConCon runs on a node of room by ACE or SciCo the online cluster; in case of remote

shift, the GUI window is exported in the remote node

2) Execution of the special Moved on WEB from a local node of tested and fully operating procedure (XMon) to the online cluster; the commandmonitor relevant sequence and parameter passing iscounting rates handled by an HTML form invoking

a perl file.

3) Check of the detector Actual and reference plots accessed The SW package to produce

calibration output via web the calibration outputis handled by the ACE

4) Visual control of the real- Snapshots of the 12 monitor displays Tested and fully operating time updated mon. displays accessed via WEB (30 s update)

Duty Project Present status

Page 14: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

14

The new Consumer Controller GUI (ConCon)

Consumer status action buttons(with pop-up menus)

Running mode switch buttons

Consumer monitors action buttons(with pop-up menus)

Monitors and event display status action buttons(with pop-up menus)

Access to the log files

• Based on ROOT classes

• Merges in a user friendly way all action alternatively performed with a lot of shell commands launching script files

• Allows remote operation when the window is exported from the local node where the GUI is running to the remote node (remote login on the online cluster are forbidden)

Page 15: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa

Control room pictures

Section of the CDF control room at Fnaldevoted to the Consumer Monitors

CDF control room in Pisa(more monitors to be added)

Page 16: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

16

Specs of the remote control room in Pisa: very easy and inexpensive !

Echo suppressionWebcam (Polycom ViaVideo)

Desktop running Polycom PVX under Win XP – IP point-to-pointconnection

4 LCD 17/19“ monitor displayScreen 1: GUI windows to handle Consumers and job monitorsScreen 2: Event Display and Calibration plotsScreen 3: Active run and reference plots for the checklistScreen 4: E-logs, checklist, Instructions (WEB pages)

Desktop running Mozilla, Root,Ghostview, ….. under SL4 monitor drive card

Number of screens will be increased for displaying monitor snapshots ….

Page 17: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 17

Commissioning, pilot remote CO shifts and operation summaryof the first regular remote CO shift period in Pisa

- Definition of the start-up procedure for shifts in Pisa and for emergency handling, e.g. local (Fnal) shifter replacement in case of any failure at the start-up and network or H.W. failures during the shift

- Test and validate new HW (Pisa and Fnal) for video connection and monitoring (Pisa)

- Check new SW ( new GUI & new WEB based procedures)

Pilot remote CO shifts (November 2006):

- Two 8 day regular owl shifts, Pisa and Fnal in overlap Long term network connection reliability was the main issue (Short network breakdowns (~<15’) are not an issue)

Regular remote CO shift operation summary (from December 2006)

- 4 shift (one per month) until March 2007, 2 shifts scheduled in April 2007

- 6 x 8 days statistics very encouraging: no H.W., nor SW problems detected, no network connectivity breakdowns, just 15 min. interruption due to a scheduled maintenance….

Commissioning (September-November 2006):

Page 18: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 18

Conclusions

• The CDF online system is reliably operating since several years; systemarchitecture and SW evolution smoothly developed

• Real time monitoring of the detector status and of data quality is now fully operated also from remote both for expert feedbacks and for qualitychecking and monitor shifters (Consumer Operators)

• Local and remote CO shifts proceed now exactly in the same way

• Changes required for the remote shifts resulted in a global improvementtowards a more user friendly environment

• First period experience of remote shifts from Pisa is very positive

• Based on the Pisa experience, CDF plans to operate other remote sitesand to achieve assignment to remote of 100% owl (at Fnal) CO shifts.

• We encourage other worldwide experiment communities to follow thisexample.

Page 19: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 19

BACK-UP

Page 20: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 20

Framework Components

I. The Consumers

•Analyze and monitor the event data

•Use CDF Run II offline framework (AC++) to look at data*

•Consists of different AC++ modules:

•APPConsumerInputModule

•ConsumerErrorModule (adds special destination to the ZOOM error logger to send errors to the ConsumerErrorReceiver)

•Module inheriting from ConsumerFrameworkModule consists of different monitors that are written by the experts. All these monitors inherit from BaseMonitor2 which starts server process at the beginning of a job.

In order to look at CDF event data, a decision was made very early on to use Cdf offline framework for the consumers. However, the consumer framework is basically free from Cdf offline specifics and the entire package can be used easily in different settings.

*

Page 21: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 21

BaseMonitor2

TConsumerInfo

• Base class for all monitors

• Provides “framework” functions and functions to be overwritten by the monitor writers.

• is sent from the consumer to server• is sent from the server to the display• contains information about the consumer

• name• run number• number of events processed• list of all ROOT objects that are available

Framework Components (cont.)

Page 22: F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b),

F.Scuri - I.N.F.N Pisa Online Monitoring for the CDF ... 22

II. The Server

III. The Display

• receives ROOT objects from the consumer via socket

• deals with requests from the displays

• reports the status of the consumer to the state manager

• ROOT-based GUI

• can connect to the server via socket or ROOT files, can browse Root objects from the consumers

• at first requests TConsumerInfo and creates a list tree

• only updates objects on canvas, does not redraw whole canvas

• useful features like; auto update, slide show, pop-up warning window, etc…

Framework Components (cont.)