Martin L Purschke, Computing meeting 5-4-1999 ONCS Computing in the ER Success stories of “online monitoring in the counting house” Offline and Online

Martin L Purschke, Computing meeting 5-4-1999

ONCS Computing in the ER

• Success stories of “online monitoring in the counting house”

• Offline and Online computing environment consolidation efforts

• ONCS software highlights

• Objectivity news

• Time classes

• New manuals

• ROOT multithreading news

• Worries and concerns


Success Stories (DCH)

• We went the “whole length” to set the DCH folks up with their monitoring in the Counting House

• Before that they would take data, transfer it to RCF, fire up STAF, look at the data,…• Now: Take data, look at the results right away. • Doesn’t sound too tricky, we have done something like that, well, in April 1997, but here...

• we have wrapped STAF modules, PHOOL, PhHistogramFactory, basically the whole offline

environment• shared library versions have to match• both sides (RCF and ONCS) have their own set of version constraints (ROOT version, egcs

versions, libstdc++, and so on• wrapped STAF and “big-event” DD are true memory hogs (had to tweak the machine setups,

0.5GB virtual memory)

• The routine operation is more like near-line monitoring, analysis is done from a file from a

very short run• Tassilo and I funneled those data through a DD pool, no problem.• No difference whether you read from file or from a DD pool.


DCH monitoring

Drift time distributions for X and stereo (UV) wires...

…taken online.


Online and Offline environment (DPM & MLP)

• We want to arrive at a state where the user can run his or her programs in RCF and the

Counting house without re-linking. • We give the “general login script” another shot• several failures to get that in the past years• stop the proliferation of account-specific “do-it-all” .login files which are hard to maintain• source one script which sets up the environment for you, anywhere (*)• centrally maintained, changes picked up by everyone right away, no more stale paths• allow a standard Redhat Linux box without root access (but with afs) to use this• in the counting house we will use local copies of most software, no hard dependence on AFS• script will adapt to local software, use AFS distribution else• after executing that script, you should be able to run the analysis software no matter where

you are.

(*) we concentrate on Linux and Solaris for now


ONCS Software highlights

• ONCS “Run Control” is now the de-facto standard for taking data in the counting house

• gives you “one-window operation” of DAQ, timing, DD system, start/stop

• gives you fancy scripting capabilities

• has become very robust over the past couple of weeks

• DD pool can handle “huge events” (more a sysadmin thing)

• DD pool interface is now the right “policy” by default (right for online monitoring)

• new account “phnxoncs” to run standard DAQ, phoncs account used to develop DAQ

• lower the chance to screw up

• give the right (todays’s) environment for the not-so-proficient user

• Some updates in Event library

• NEW/PRO scheme a la CERNLIB in place


DD pools as I like them

ddEventiterator

DD Pool

dpipe

data file

testEventiterator

DD Pool

data file

New utility

MDC2 data file

DD Pool

DD Pooldata file

dpipe dpipe

ddEventiterator

ddEventiterator

Worked very reliably, shuffled 10GB through


Objectivity news

• We (DPM & MLP) successfully set up a federated database with two “autonomous

partitions” spanning RCF and the counting house.• An autonomous partition is a part of the federation which is independent from other

partitions. • It has its own lock server• It has its own host machine(s)• In normal running, the partitioning is invisible (unless you want to know). • In case of a failure, the partitions will continue to function for processes only accessing

data within one partition• This is the case for most DAQ processes (run control, etc), so if RCF is down, it won’t

affect the data taking (and vice versa, but when are our machines ever down?)

It’s a major step forward towards doing away with a lot of ASCII-based configuration files.


Database time stamps

We have talked about time stamps for database entries (and the time tags in the Event headers) in the past. So far we didn’t have a good common solution.

A tag in the database should give you the time an entry was made somewhat accurately, but time isn’t good enough to identify individual events from the DAQ. You will need that capability to drop individual events from the analysis later, or to maintain a “hit list” of your favorite events.

We will have a “composite” of time tag and run/event number for identifying events, and time only for things like HV readings, temperature readings, time a FPGA code was used, and so on.

Our “time” supports comparisons, time windows, <, >, ==, +, so it’s easy to find out whether an “event” is within a certain time window (such as validity range for a calibration) or not.

And the best: Offline and online use the same time format!

We will use that on Unix, NT, and VxWorks.


And the winner is: the VMS time format

26-APR-1999 14:51:50.00 44318551100000000 Mon Apr 26 10:51:50 1999

6-AUG-2034 14:51:50.00 55452055100000000 Sun Aug 6 10:51:50 2034

VMS date (CH) ticks Unix date(+timezone)

Why VMS? It is a well-known, widely-used (also outside VMS) format

and at least one of us is a VMS nostalgic :-)

100 ns granularity, 64 bit tick counter since “Smithsonian base date” 00:00 17-Nov-1858

Unix (Posix): granularity 1s (Unix date, etc)

We won’t come close to making use of the 100ns granularity now

but we might be able to do better (NT clock, correlate accelerator ticks with time ticks, a common time base, something) in the future and will be ready

64 bit integer ready to be fed to conversion routines, and then to ctime, etc


Y10K and Y31K bugs

COMPONENT: SYSTEM TIME OP/SYS: VMS, Version 4.n

LAST TECHNICAL REVIEW: 06-APR-1988

SOURCE: Customer Support Center/Colorado Springs

This base time of Nov. 17, 1858 has since been used by TOPS-10, TOPS-20,and VAX/VMS. Given this base date, the 100 nanosecond granularityimplemented within VAX/VMS, and the 63-bit absolute time representation (thesign bit must be clear), VMS should have no trouble with time until:

31-JUL-31086 02:48:05.47

At this time, all clocks and time-keeping operations within VMS willsuddenly stop, as system time values go negative.

Note that all time display and manipulation routines within VMS allowfor only 4 digits within the 'YEAR' field. We expect this to be correctedin a future release of VAX/VMS sometime prior to 31-DEC-9999.


New Manuals

Over the past few weeks, we have beefed up the manuals for the DD, Message system, and, believe it or not, the Event library. Html and postscript, mostly automatically generated. Fun reading.

Has some real-life examples, in-depth reference information, a must for everybody analyzing data!

we now return to our presentation.


Root status (multithreading)

At the FNAL workshop I agreed to re-visit the “multi-threaded Root” issue.

Some progress: I was able to run ROOT with 2 threads.

• only for compiled code (that’s ok)

• heavy use of mutex’es because most ROOT classes are not thread-safe

• this is just the beginning, far from a working system

• MLP2’s socket classes and shared memory are fallback solutions, not very

efficient.

I see people roll their own -- better help us! I had to put that project on the back

burner for the past two weeks.


Worries and concerns

• Too little coordination with the online monitoring efforts. The “tool

sharing”/common solution concept doesn’t catch on (what little there is is all

by the same author). • We see people roll their own monolithic solutions -- that will hurt us badly

soon. Instead, better use your time to work with us and make something

which more than one subsystem can use.• Very little time contingency, when things like the hacker incident happen,

other projects are delayed. • Number of machines: It’s crowded, but far from being enough CPU power.

We need more over time. • Space: where do we put more terminals? • Memory: most Linux machines have 64Mb. Offline sees the minimum at

256, better 512. We should upgrade as soon as we can. Institutional

contributions, anyone?

Documents

Martin L Purschke, Computing meeting 5-4-1999 ONCS Computing in the ER Success stories of “online monitoring in the counting house” Offline and Online