15
Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14 Online Data Challenges

Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Embed Size (px)

Citation preview

Page 1: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Online Data Challenges

David Lawrence, JLabFeb. 20, 2014

2/20/14

Page 2: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Online Data Challenge 2013Primary participants: Elliott Wolin, Sean Dobbs, David Lawrence

When: August 26 – 29, 2013

Where: Hall-D Counting House

Objective: Test data flow and monitoring between final stage Event Builder (EB) and tape silo (i.e. neither the DAQ system nor the offline were included)

2/20/14

Page 3: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Input Data

• Pythia-generated events simulated, smeared, and passed through L1 event filter*

• Events digitized and written in EVIO format– mc2coda library used to write in the new event

building scheme specification provided by DAQ group

– Translation table derived from Fernando’s spreadsheet detailing the wiring scheme that will be used

*event filter may have used uncalibrated BCAL energy units, but resulted in roughly 36% of events being kept.

2/20/14 Online Data Challenges

Page 4: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Computers systems(many of these on loan)

*n.b. all L3 machines connected via InfiniBand

2/20/14Online Data Challenges

Page 5: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

L3 Infrastructure Test• 10 nodes used to pass events from Event Builder (EB) to Event Recorder (ER)

– EB on gluon44, ER on halldraid1

• Two “pass-through” modes used:– Simple buffer copy without parsing (40kHz)– Buffer copy with parsing and application of translation table (~13kHz)

2/20/14

• DL3TriggerBDT algorithm from MIT– Unable to run for extended time

without crashing– Cause of crashes unknown and under

investigation• MIT and TMVA code itself have been

eliminated as causes– Total rate of ~7.2KHz

Page 6: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

L3 Prototype Farm

2/20/14

Fast Computers used for testing (gluon44, gluon45) Computers ordered for L3 infrastructure

Max rate estimate for L3-BDT prototype farm: (1.6kHz)(12366/5104)(10 nodes) = ~39kHz

10 Computers ordered and will be shipped next week for L3 prototype farm

Page 7: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges2/20/14

Page 8: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Monitoring System (RootSpy) test• Histograms produced by several plugins were displayed via the RootSpy GUI• Overlay with archive histograms• RootSpy archiver (writing summed histograms to file)

• Integration with CODAObjects• Still need to fully implement final histograms mechanism

Pre-L3 Monitoring

Post-L3 Monitoring

2/20/14

Page 9: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Other monitoring

• Ganglia installed and working for monitoring general health of all computer nodes

2/20/14

• JANA built-in monitoring available via cMsg allowing remote:– Probing for rates– Changing number of

threads– Pause, resume, quit (either

individually or as group)

Page 10: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

RAID to Silo test

• Transfer from RAID disk to silo tested– At least 50MB/s achieved, but possibly higher– Certificate and jput set up, but we were informed

later that a different mechanism should be used for experimental data from the halls

– Will arrange for IT division experts to come run tests and educate us on the proper way we should be transferring to the silo

2/20/14

Page 11: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges2/20/14

Page 12: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Primary Goals

2/20/14

1. Test system integration from L1 trigger to tape using low rate, cosmic trigger

2. Test system integration from ROC to tape using M.C. data at high rate

3. Test calibration event tagging3 secondary goal: Test multiple output streams

*Fully installed and calibrated trigger system is not required (only 1 crate needed).*Fully installed controls system is not required (only on/off of some channels needed).

Page 13: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Differences from ODC2013

• Data will come from crates in hall• CODA-component Farm Manager• New ROOTSpy features– advanced archiver functions– reference histograms

• High speed copy from RAID to Tape Library• Faster farm CPUs (16 core, 2.6GHz Xeon E5-2650)

2/20/14

Page 14: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Schedule

2/20/14

Check-out_Trigger_Electrncs_(DAQ)

Check-out_DAQ_Hardware_(DAQ)

Check-out_Online_Computers_(DAQ)

Debug_DAQ_software

Archiving_and_Retrieval_of_Histogram_Files

Run_bookkeeping_scripts_(database;data_files_and_special_events)

Farm_Process_Manager

ROOTSpy_enabled_Monitoring_Processes

Event_Display

Transport_from_RAID_to_Tape_Silo

Archiving_Run_Conditions

Archiving_DAQ_Configuration

ODC2014

-200 -100 0 100 200 300 400

ODC2014: May 12-16, 2014

Page 15: Online Data Challenges David Lawrence, JLab Feb. 20, 2014 2/20/14Online Data Challenges

Online Data Challenges

Summary• EB to ER Data flow piece tested

– L3 infrastructure tested and works in pass-through mode at 40kHz (mysterious issues with L3 plugin still being tracked down)

• Monitoring system tested– Identical pre-L3 and post-L3 monitoring systems– RootSpy GUI used with multiple producers– RootSpy archiver

• RAID to Tape silo tested– Successfully transferred > 1TB from counting house to silo at >= 50MB/s– Rate seemed slower than anticipated by factor of 2, but measurement

mechanism not accurate due to staging– Alternate transfer method has been advised and will be pursued

2/20/14