Irina Makhlyueva ALICE DAQ group
28 February 2005
DATE multi-stream recorder
Irina Makhlyueva – ALICE DAQ 228 February 2005
Contents Purpose of DATE recorders
Throughput balancing Multiple stream recording DATE mStreamRecorder
Structure Features Configuration files
Performance Plans
Irina Makhlyueva – ALICE DAQ 328 February 2005
Purpose of DATE online recorders
To provide an interface between EBs and CASTOR: Event Builder by itself has only limited recording functions
(local disk and “online recording” to FIFOs/buffers). Anything beyond that is the task of independent processes running on the GDC hosts – online recorders
Data conversion option (raw event format→ ROOT) EB/CASTOR throughput balancing
To ensure a continuous data extraction from the EB buffers and make the best use of the available CASTOR throughput
Irina Makhlyueva – ALICE DAQ 428 February 2005
EB/CASTOR throughput balancing EB’s main task is event building. The EB load balancing is performed by EDM. The total
number of GDCs is tuned to optimize the DAQ performance. In theory, the CASTOR segment dedicated to ALICE should have a sufficient (or better)
aggregate throughput to match the total data rate delivered by all GDCs. Recorders must not be a bottleneck! In particular, EBs should never get blocked due to
the recorder latencies. Therefore, the recorder layer needs a sufficient bandwidth reserve to provide the best possible balance between GDCs and CASTOR throughputs.
Currently, CASTOR is lagging behind DATE. In this situation, the important task of the recording layer is to make the best use of the CASTOR capacity.
(*) Photo non contractuelleGDC farm (*) CASTOR (*)
Tota
l GDC
thro
ughp
ut
Aggr
egat
e CA
STO
R th
roug
hput
recorderlayer
lossrecorder bandwidth reserve
Irina Makhlyueva – ALICE DAQ 528 February 2005
From mono- to multi-stream recording
The past DC experience showed that GDCs with single recording streams could not fully load CASTOR. However, increasing the number of GDCs is expensive.
The solution: a multiple stream recording, with >1 recording channel (stream) per GDC. This is also beneficial for the GDC efficiency: should a latency in one stream occur, the other stream(s) can take over.
Recording layer capacity (BW) ≈ number of ACTIVE streams = total – blocked In practice, the optimal number of streams per GDC is dictated by the requirement of CASTOR
saturation → its tuning is one of the DC goals
cast
or
Bad: CASTOR is underloaded~1 stream/GDC
DAQ
Not good: no BW reserve≈2 streams/GDC
OK: adequate recorder BW, losses are due limited CASTOR BW. ≥3 streams/GDC
Irina Makhlyueva – ALICE DAQ 628 February 2005
The new DATE multi-stream recorderWas introduced in August, 2004 and has passed extensive testing since then. in DATE since release 5.0 very useful for debugging/testing of new CASTOR facilities is used in the ongoing DC (3 streams/GDC, raw data mode)
Event descriptorsvia SimpleFifo
Events via EB Consumer API
Fatal conditionsignals (interrupts)
(optional) raw/ROOT transformation
Dispatchingalgorithm
n File destinationassigned to stream n
Reporting to infoLogger/Stat
EB buffer
GDC disp…
..
1 ………..2 NRecorder shared memory (internal logging etc)
stream (1)
Config file
mStreamRecorder
Event Builder
1
2
N
GDC host
stream (2)
stream (N)
Irina Makhlyueva – ALICE DAQ 728 February 2005
mStreamRecorder: features The recorder consists of disp + N × stream processes, running on a GDC host EB is configured for online recording (recording device “:” in run parameters) Dispatching process disp:
started by the DATE run control reads/interpretes the configuration file mStreamRecorder.config; forks and configures n identical stream processes, distributes event descriptors from the EB
between the streams according to a specified dispatching algorithm: “equal-load”, “first-available” or “special” (trigger-mask etc – not yet implemented).;
handles interrupts from the streams. Recording stream process:
totally independent of other streams, communicates only with disp via FIFO (ev. descriptors), signals (fatal conditions), shared memory (status reporting) and a pipe (configuration);
retrieves the events from the EB buffer, using eventBuilder API; (optionally, transforms the events into ROOT objects); Opens, closes, writes files to a specified destination: local disk, CASTOR (RFIO or ROOTd); Reports its status to disp via shared memory.
disp and streams send log messages via DATE infoLogger Both disp and streams use polling with timeout/sleep (10 ms): very low CPU load, good
response
Irina Makhlyueva – ALICE DAQ 828 February 2005
mStreamRecorder: configuration
Flexible, easily scalable configuration, currently based on a traditional ASCII config file
Simple case the recorders for all GDCs have the same configuration:
3 identical streams “first-available” dispatching method all streams have the same
properties (destinations, file sizes, polling regimes, buffering).
A bit more sophisticated configuration: Separate configurations for
different GDCs (e.g., different number of streams/GDC)
A variety of streams with different destinations
>COMMON method=2 Nstreams=3 loglevel=1>RECORDERS default_rec stream=default_str >OSTREAMS default_str sleep=1 fsize=1024 mxrecl=0 \ pool=alimdc6 stager=lxs5007 \ path=/castor/cern.ch/alice/mdc6/tapes
DC6
>COMMON loglevel=1 method=1 dump=1>RECORDERS default_rec stream=default_str !! 1 stream tbed0084gdc Nstreams=2 !! 2 (default) streams tbed0021gdc stream=default_str stream=public \ stream=local !! 3 streams>OSTREAMS default_str sleep=1 fsize=1024 mxrecl=0 \ pool=alimdc6 stager=lxs5007 \ path=/castor/cern.ch/alice/mdc6/tapes public sleep=2 \ path=/castor/cern.ch/user/m/makhlyui \ pool=public stager=stagepublic fsize=128 local path=/tmp =public !! write to local /tmp
Demo
Irina Makhlyueva – ALICE DAQ 928 February 2005
Recorder performance
Rough start with a transition of CASTOR to the new platform → the DC delays. E.g. (end of January): very long file open latency (from seconds to ~1 hour!).
Multi-stream feature of the DATE recorder helped to push the CASTOR load to the extreme. Numerous modifications done to mStreamRecorder (protections, error recovery, internal communication, reporting, bugs) → a more robust and better performing code.
Ongoing DC configuration: 15 LDCs, 40 GDCs and 3 streams/GDC. Very fresh results: an average data rate of ~300 MB/s, with large momentary fluctuations (CASTOR is being debugged in flight !)
Initial debugging and timing was done using local disks and the old (“production version”) CASTOR.
Smooth operation with up to 100 streams/GDC and 1-2 GDCs.
Irina Makhlyueva – ALICE DAQ 1028 February 2005
Plans for near-term future
ROOT recording commissioning ……….. Transition to DB-based configuration (retaining
ASCII option) ………………….. Full integration in the DATE Run Control (SOR.command → SMI object) ………….. Reporting to DATE status display ……… A detailed study of the overall recording
performance, as function of # streams / GDC
in progress
to be done
to be donein progress
during the DC
Irina Makhlyueva – ALICE DAQ 1128 February 2005
Iguatzu Falls (Argentina). © www.eTravelPhotos.comLong-term
Irina Makhlyueva – ALICE DAQ 1228 February 2005
Thanks and Credits
• R. Divia for his help with the recorder debugging, the event builder API and other aspects of DATE • P. Vande Vyvre for numerous discussions • B. Panzer for the idea of multiple stream recording• J-D Durand, O.Barring and other members of IT/ADC – for help in struggling with CASTOR • K. Schossmaier and U.Fuchs, for help with DATE and Linux installation• T. Kuhr and C. Cheshkov of ALICE offline group – for the ROOT recording codes and related stuff• Photographs from WWW sites:
http://www.etravelphotos.com/argentina/ http://www.amnews.com/mt/kb/archives/000766.htmlhttp://wallpapers.graphicfreebies.com (stream)http://FreeFoto.com (Waterfalls)
The end
Irina Makhlyueva – ALICE DAQ 1328 February 2005
LDCs
GDC pool
CASTOR
multiple recordingstreams
Spare slide
Recommended