Upload
karin-hutchinson
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Feb. 19, 2015David Lawrence JLab
ControlsDAQMonitoring
L1 triggerCounting
House Operations
Hall-D Online Systems Status
Online Status -- David Lawrence 2
Data RatesROC
Event Builder
Event Recorder
Tape Library
ROC
ROC
ROC
ROC
ROC
ROC
ROC
Spec: 100MB/secTested: ~30MB/sec
Spec: 3000MB/secTested: 600MB/sec
Spec: 300MB/secTested: 600MB/sec
Spec: 300MB/secTested: 450MB/sec
“Tested” means with actual data while it was being acquired. In some cases, offline testing has achieved significantly higher rates.
72TB x2 RAID disk
(L3 farm)
125.9TB in 147,355 files written to tape in 2014 commissioning run
Online Status -- David Lawrence 3
Mode 7 (fADC integrals)
Mode 8 (fADC full samples)
232 kB/event
69 kB/event
Online Status -- David Lawrence 4
Mode 7 (fADC integrals)
Mode 7 (full samples)fADC250
fADC125fADC125
fADC250 fADC250/F1TDC fADC125fADC125
Mode 7 (fADC integrals)
FCAL
BCAL
FCAL
BCAL
FDC
FDC
CDC
CDC
fADC250/F1TDC
Online Status -- David Lawrence 5
Adjusting profile of 2014 commissioning data based on recent or planned firmware upgrades is used to estimate event size for production data in the future.
(Additional compression is expected when disentangled data is rebuilt after L3 into an as yet undetermined format.)
(18kB/event from simulation is used to estimate resources for computer center)
Online Status -- David Lawrence 6
EVIO Formatted Raw Data Files• File format specified in detail by CODA group
(https://coda.jlab.org/drupal/system/files/coda/onlineFormat/eventbuilding.pdf)
• Some corrupted events encountered– Problem due to race condition in ER and only occurs for high
rates. Has since been fixed in CODA.– Wrote new EVIO parser code
• Error recovery (detects and skips bad blocks/events)• Mechanism to efficiently grow buffer size• Some “features” still need ironing out (e.g. memory leak)
• Event parsing implements disentangling in parallel
Online Status -- David Lawrence 7
Online Monitoring
Online Status -- David Lawrence 8
Online Monitoring
• System did not run consistently– Sometimes sluggish or non-responsive– Processes would crash on some nodes with
difficult to access error logs– ROOT archive files often empty or corrupt– Slow event rate seemed to result in tiny
processing rate due to “burst” effect• These issues are currently being addressed
Online Status -- David Lawrence 9
Preparations for next run
• L1 coincidence trigger• ~10kHz DAQ rate (requires f125 multiblock)
– Sync events (will require offline mapping)• L3 infrastructure test w/ event tagging• Secondary ET system for monitoring• Run info database integration/enhancement• Auxiliary run data packaging for tape storage
– Auto-deletion and RAID disk swapping • Controls
– Scaler readout into EPICS being reworked more efficiently– Goniometer– Voltage controls
Online Status -- David Lawrence 10
Summary
• 126 TB written to RAID and copied to tape• 600 MB/s written to RAID from DAQ while taking data• 450 MB/s copy from RAID to tape• Electronic Logbook used successfully
– https://logbooks.jlab.org
• Event size larger than expected, but currently being addressed• Several items still need to be addressed prior to 2015
commissioning– Many things were done “by hand” but need to either be automated,
or a better procedure developed for long term operations to ensure integrity/consistency of data over a long period of time and efficient use of human resources
Online Status -- David Lawrence 11
Backup Slides
Online Status -- David Lawrence 12
Counting house computer systemsComputer(s) processor
General Purpose Network
DAQ Networ
kI.B.
Network comments
gluonfs1 N/A X ~1.6TB with snapshot backup
gluonraid1-2 Intel E5-2630 v2 @2.6GHz
X X X RAID disk hostER process
gluon01-05 i5-3570 @3.4GHz X Shift taker consoles
gluon20-23 AMD 2347 X Controls 8core
gluon24-30 E5-2420 @1.9GHz
X Controls (gluon24 is web/DB/cMsg server) 12core + 12ht
gluon40-43 AMD 6380 X X X 16core + 16”ht”
gluon46-49 E5-2650 v2 @2.6GHz
X X(gluon47 &49)
X 16core + 16ht
gluon100-111 E5-2650 v2 @2.6GHz
X X 16core + 16ht
rocdev1 Pentium 4 @2.8GHz
X RHEL5 system for compiling ROLs for DAQ
hdguest0-3 i5-3470 @3.2GHz X(outside network)
Guest consoles in cubicles (outside network)
Online Status -- David Lawrence 13
Rough Specs. Review• 108 g/s on LH2 target -> ~400kHz hadronic rate• L1 trigger goal is to cut away ~50% leaving 200kHz• L3 trigger goal is to reduce by ~90% leaving 20kHz• Early simulation suggested ~15kB/event
• Design specs*:– 15kB/event @ 200 kHz = 3000 MB/s (front end)– L3 reduction by factor of 10 = 300MB/s to RAID disk– 3 days storage on RAID = 300MB/s*3days = 78TB– Maintain 300MB/s transfer from RAID to tape
*L3 not officially part of 12GeV upgrade project
Online Status -- David Lawrence 14
Mod
e 7
(fAD
C In
tegr
als)
Mod
e 8
(fAD
C fu
ll sa
mpl
es)
• Each 32bit word in the EVIO file tallied to identify what file space is being used for
• Comparison between mode 7 and mode 8 data made
Example: some of the fADC250 word types
Online Status -- David Lawrence 15
Online Status -- David Lawrence 16
Event Size• Simulation was
consistent with initial estimate of event size
• Actual data was more than x4 larger
• Much of the data was taken in “raw” mode where fADC samples were saved commissioning data mode 8
commissioning data mode 7
est. w/ firmware upgrades
Simulation
Initial Estimate
0 50 100 150 200 250
232 kB
69 kB
24 kB
18 kB
15 kB
Event Size in EVIO format
Online Status -- David Lawrence 17
DAQ to Detector Translation Table• The Translation Table is used to convert from DAQ system coordinates
(rocid, slot, channel) into detector-specific coordinates (e.g. BCAL module, layer, sector, end)
• ~23k channels defined in SQLite DB file
• Stored in CCDB as XML string for offline analysis with complete history:– /Translation/DAQ2detector
Online Status -- David Lawrence 18
hdmon
Monitoring Plugins
BCAL_online
CDC_online
DAQ_online
FCAL_online
FDC_online
PS_onlineST_online
TAGH_online
TAGM_online
TOF_online
Each detector system provides 1 or more plugins that create
histogramsfor monitoring
All plugins areattached to a
Common DANAprocess (hdmon)
A “rootspy” pluginpublishes all
histogramsto the network
rootspy
Online Status -- David Lawrence 19
Raw Data Formatted Files(from simulated data)
CCDB
hdgeant_smeared.hddm
rawevent plugin(JANA + mc2coda)
run0002.evio
(Data file in same format as will be produced by CODA DAQ system)
Calibration code development
evioSplitRoc
roc002.evio
roc003.evio
roc004.evio...
Online Status -- David Lawrence 20
L3 and monitoring architecture
ETEB ET ER
L3L3L3
ET
MonMon
Mon
L3 and monitoring processes are decoupled. They could run on same nodes though if desired.
gluon53 gluonraid1
gluon46
ET
Mon
Mon
Mon
(Data flows from left to right)farm manager
farm managerfarm manager
Online Status -- David Lawrence 21
hdmongui
multiple “levels” supported
processes run multi-threaded
Online Status -- David Lawrence 22
Online Status -- David Lawrence 23
Online Status -- David Lawrence 24
Online Status -- David Lawrence 25
Current code
Online Status -- David Lawrence 26
All pool maximums increased x10
Only TrackHit pool max increased x10