15
Offline shifter training tutorial L. Betev February 19, 2009

Offline shifter training tutorial L. Betev February 19, 2009

Embed Size (px)

Citation preview

Page 1: Offline shifter training tutorial L. Betev February 19, 2009

Offline shifter training tutorial

L. BetevFebruary 19, 2009

Page 2: Offline shifter training tutorial L. Betev February 19, 2009

2

Outline

• Offline shifter basic responsibilities• The shifter check list• Systems and tools (separate talks)• The Dashboard• The Shuttle• Offline Shifter Information System• Event display

Page 3: Offline shifter training tutorial L. Betev February 19, 2009

3

Basic responsibilities – RAW data

• The RAW data path

DAQ online buffer @P2

Fast optical link to CERN CC, maximum rates: 500MB/sec (p+p), 1.25GB/sec (Pb+Pb)

CASTOR2 disk buffer CASTOR2 tape buffer

Step A

Step B

Reduced100 MB/sec (p+p)

Page 4: Offline shifter training tutorial L. Betev February 19, 2009

4

Step A – Online buffer -> CASTOR buffer

• Automatic and well-exercised (it almost never goes wrong)

• At this step, the files are also registered in the AliEn catalogue through a gateway

• DAQ is nominally responsible for the transfers

• Offline provides the registration gateway

• If not working, DAQ/SL notifies the shifter and/or the [email protected] expert list

Page 5: Offline shifter training tutorial L. Betev February 19, 2009

5

Step A – Shifter responsibilities• Monitors the fill of the CASTOR buffer

(dashboard)• Notify the run coordinator/shift leader if more than 80%

full• Clear disk space following instructions received from

the SL

• Follow the registration of RAW (dashboard)• All runs in PHYSICS partition are typically written

to CASTOR• Follow the run screen and grow suspicious if none of

the runs are being registered• Contact the SL and ask what is going on

Page 6: Offline shifter training tutorial L. Betev February 19, 2009

6

Step B – CASTOR buffer -> Tape storage

• Selective copying of runs to tape• Part of the p+p data stream (depends on the acquisition

rate, max 100MB/sec)• Full data stream in Pb+Pb (1.25GB/sec)

• The selection of runs to be copied/removed is provided by the SL• Offline shifter is responsible for the copy procedure

(dashboard)• And for the deletion of data from the CASTOR buffer

Page 7: Offline shifter training tutorial L. Betev February 19, 2009

7

Basic responsibilities – Shuttle

• Covered in Shuttle presentation• Here just to put it in the context of the basic responsibilities

Page 8: Offline shifter training tutorial L. Betev February 19, 2009

8

Basic responsibilities – event display

• Covered in Event Display presentation• Here just to put it in the context of the basic responsibilities

Page 9: Offline shifter training tutorial L. Betev February 19, 2009

9

Basic responsibilities – data replication

• After RAW is recorded to tape in CASTOR• A copy is made to a remote T1 centre (out of 6

possible) for custodial storage and processing

• The replication is an automatic process, triggered at EoR• Progress is displayed on the dashboard, the

shifter follows the transfers and reports problems

• Presently (muon/calibration runs) – automatic replication is disabled

Page 10: Offline shifter training tutorial L. Betev February 19, 2009

10

Basic responsibilities – offline processing pass 1 (at CERN T0)

• After RAW is recorded to tape in CASTOR + Shuttle is done• Processing is launched automatically

• Progress is displayed on the dashboard

• Automatic processing – only for PHYSICS runs• Detector calibration runs are processed on request

• The Offline shifter (if asked by detector groups/run coordination) collects the run numbers and writes them in the shifter report

Page 11: Offline shifter training tutorial L. Betev February 19, 2009

11

Offline shifter check list• Registration of RAW (dashboard)

• Periodic check of status

• Follow PHYSICS runs (start/stop in DAQ logbook) and registration to CASTOR

• Ask shift leader in case of doubt

• Report registration errors to on-call expert (list of experts in aloshi)

• Run copy and removal procedure (dashboard)

• Shuttle (dashboard)

• Follow on processing of all runs + global Shuttle messages

• In case of preprocessor failures, escalate to (concerned) detector shifters, note in shifter report (aloshi)

• In case of Shuttle failures first follow the restart/debug procedures, then report to on-call expert

Page 12: Offline shifter training tutorial L. Betev February 19, 2009

12

Offline shifter check list (2)• Data replication (dashboard)

• Periodic check of replication status

• Note ‘stuck’ runs – not replicated 12 hours after registration – in the shifter report pages and sent list to [email protected]

• Data processing pass 1 (dashboard)

• Periodic check of processing status

• Note ‘stuck’ runs – not processed 12 hours after registration – in the shifter report pages and sent list to [email protected]

• Shift report (aloshi)

• At end of shift – summary of the operation and noteworthy events

Page 13: Offline shifter training tutorial L. Betev February 19, 2009

13

System Run Coordination meeting

Evening shifter onlyAttend the daily @17:30 System Run

Coordination (SRC) meetingPrepare and present a 24-hour Offline status

reportTemplate for the report is given in aloshi

Page 14: Offline shifter training tutorial L. Betev February 19, 2009

14

General shifter rules• Before pressing the

• Read the procedures and rules, defined for each error type• aloshi has a search feature, use it to look for

similar problems and solutions• Try out the remedies• If all fails, inform the on-call expert

Page 15: Offline shifter training tutorial L. Betev February 19, 2009

15

Information sources for the shifter

• The shifter manual – instructions

• Shifter interface (http://aloshi.cern.ch)

• Monitoring – MonALISA (http://alimonitor.cern.ch/)

• Dashboard

• Shuttle

• Processing and data management