26
Feb. 19, 2015 David Lawrence JLab Contro ls DAQ Monitori ng L1 trigger Counting House Operatio ns Hall-D Online Systems Status

Feb. 19, 2015 David Lawrence JLab Counting House Operations

Embed Size (px)

Citation preview

Page 1: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Feb. 19, 2015David Lawrence JLab

ControlsDAQMonitoring

L1 triggerCounting

House Operations

Hall-D Online Systems Status

Page 2: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 2

Data RatesROC

Event Builder

Event Recorder

Tape Library

ROC

ROC

ROC

ROC

ROC

ROC

ROC

Spec: 100MB/secTested: ~30MB/sec

Spec: 3000MB/secTested: 600MB/sec

Spec: 300MB/secTested: 600MB/sec

Spec: 300MB/secTested: 450MB/sec

“Tested” means with actual data while it was being acquired. In some cases, offline testing has achieved significantly higher rates.

72TB x2 RAID disk

(L3 farm)

125.9TB in 147,355 files written to tape in 2014 commissioning run

Page 3: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 3

Mode 7 (fADC integrals)

Mode 8 (fADC full samples)

232 kB/event

69 kB/event

Page 4: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 4

Mode 7 (fADC integrals)

Mode 7 (full samples)fADC250

fADC125fADC125

fADC250 fADC250/F1TDC fADC125fADC125

Mode 7 (fADC integrals)

FCAL

BCAL

FCAL

BCAL

FDC

FDC

CDC

CDC

fADC250/F1TDC

Page 5: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 5

Adjusting profile of 2014 commissioning data based on recent or planned firmware upgrades is used to estimate event size for production data in the future.

(Additional compression is expected when disentangled data is rebuilt after L3 into an as yet undetermined format.)

(18kB/event from simulation is used to estimate resources for computer center)

Page 6: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 6

EVIO Formatted Raw Data Files• File format specified in detail by CODA group

(https://coda.jlab.org/drupal/system/files/coda/onlineFormat/eventbuilding.pdf)

• Some corrupted events encountered– Problem due to race condition in ER and only occurs for high

rates. Has since been fixed in CODA.– Wrote new EVIO parser code

• Error recovery (detects and skips bad blocks/events)• Mechanism to efficiently grow buffer size• Some “features” still need ironing out (e.g. memory leak)

• Event parsing implements disentangling in parallel

Page 7: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 7

Online Monitoring

Page 8: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 8

Online Monitoring

• System did not run consistently– Sometimes sluggish or non-responsive– Processes would crash on some nodes with

difficult to access error logs– ROOT archive files often empty or corrupt– Slow event rate seemed to result in tiny

processing rate due to “burst” effect• These issues are currently being addressed

Page 9: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 9

Preparations for next run

• L1 coincidence trigger• ~10kHz DAQ rate (requires f125 multiblock)

– Sync events (will require offline mapping)• L3 infrastructure test w/ event tagging• Secondary ET system for monitoring• Run info database integration/enhancement• Auxiliary run data packaging for tape storage

– Auto-deletion and RAID disk swapping • Controls

– Scaler readout into EPICS being reworked more efficiently– Goniometer– Voltage controls

Page 10: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 10

Summary

• 126 TB written to RAID and copied to tape• 600 MB/s written to RAID from DAQ while taking data• 450 MB/s copy from RAID to tape• Electronic Logbook used successfully

– https://logbooks.jlab.org

• Event size larger than expected, but currently being addressed• Several items still need to be addressed prior to 2015

commissioning– Many things were done “by hand” but need to either be automated,

or a better procedure developed for long term operations to ensure integrity/consistency of data over a long period of time and efficient use of human resources

Page 11: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 11

Backup Slides

Page 12: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 12

Counting house computer systemsComputer(s) processor

General Purpose Network

DAQ Networ

kI.B.

Network comments

gluonfs1 N/A X ~1.6TB with snapshot backup

gluonraid1-2 Intel E5-2630 v2 @2.6GHz

X X X RAID disk hostER process

gluon01-05 i5-3570 @3.4GHz X Shift taker consoles

gluon20-23 AMD 2347 X Controls 8core

gluon24-30 E5-2420 @1.9GHz

X Controls (gluon24 is web/DB/cMsg server) 12core + 12ht

gluon40-43 AMD 6380 X X X 16core + 16”ht”

gluon46-49 E5-2650 v2 @2.6GHz

X X(gluon47 &49)

X 16core + 16ht

gluon100-111 E5-2650 v2 @2.6GHz

X X 16core + 16ht

rocdev1 Pentium 4 @2.8GHz

X RHEL5 system for compiling ROLs for DAQ

hdguest0-3 i5-3470 @3.2GHz X(outside network)

Guest consoles in cubicles (outside network)

Page 13: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 13

Rough Specs. Review• 108 g/s on LH2 target -> ~400kHz hadronic rate• L1 trigger goal is to cut away ~50% leaving 200kHz• L3 trigger goal is to reduce by ~90% leaving 20kHz• Early simulation suggested ~15kB/event

• Design specs*:– 15kB/event @ 200 kHz = 3000 MB/s (front end)– L3 reduction by factor of 10 = 300MB/s to RAID disk– 3 days storage on RAID = 300MB/s*3days = 78TB– Maintain 300MB/s transfer from RAID to tape

*L3 not officially part of 12GeV upgrade project

Page 14: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 14

Mod

e 7

(fAD

C In

tegr

als)

Mod

e 8

(fAD

C fu

ll sa

mpl

es)

• Each 32bit word in the EVIO file tallied to identify what file space is being used for

• Comparison between mode 7 and mode 8 data made

Example: some of the fADC250 word types

Page 15: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 15

Page 16: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 16

Event Size• Simulation was

consistent with initial estimate of event size

• Actual data was more than x4 larger

• Much of the data was taken in “raw” mode where fADC samples were saved commissioning data mode 8

commissioning data mode 7

est. w/ firmware upgrades

Simulation

Initial Estimate

0 50 100 150 200 250

232 kB

69 kB

24 kB

18 kB

15 kB

Event Size in EVIO format

Page 17: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 17

DAQ to Detector Translation Table• The Translation Table is used to convert from DAQ system coordinates

(rocid, slot, channel) into detector-specific coordinates (e.g. BCAL module, layer, sector, end)

• ~23k channels defined in SQLite DB file

• Stored in CCDB as XML string for offline analysis with complete history:– /Translation/DAQ2detector

Page 18: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 18

hdmon

Monitoring Plugins

BCAL_online

CDC_online

DAQ_online

FCAL_online

FDC_online

PS_onlineST_online

TAGH_online

TAGM_online

TOF_online

Each detector system provides 1 or more plugins that create

histogramsfor monitoring

All plugins areattached to a

Common DANAprocess (hdmon)

A “rootspy” pluginpublishes all

histogramsto the network

rootspy

Page 19: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 19

Raw Data Formatted Files(from simulated data)

CCDB

hdgeant_smeared.hddm

rawevent plugin(JANA + mc2coda)

run0002.evio

(Data file in same format as will be produced by CODA DAQ system)

Calibration code development

evioSplitRoc

roc002.evio

roc003.evio

roc004.evio...

Page 20: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 20

L3 and monitoring architecture

ETEB ET ER

L3L3L3

ET

MonMon

Mon

L3 and monitoring processes are decoupled. They could run on same nodes though if desired.

gluon53 gluonraid1

gluon46

ET

Mon

Mon

Mon

(Data flows from left to right)farm manager

farm managerfarm manager

Page 21: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 21

hdmongui

multiple “levels” supported

processes run multi-threaded

Page 22: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 22

Page 23: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 23

Page 24: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 24

Page 25: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 25

Current code

Page 26: Feb. 19, 2015 David Lawrence JLab Counting House Operations

Online Status -- David Lawrence 26

All pool maximums increased x10

Only TrackHit pool max increased x10