Upload
valerie-ford
View
212
Download
0
Embed Size (px)
Citation preview
Martin L Purschke, Computing meeting 5-4-1999
ONCS Computing in the ER
• Success stories of “online monitoring in the counting house”
• Offline and Online computing environment consolidation efforts
• ONCS software highlights
• Objectivity news
• Time classes
• New manuals
• ROOT multithreading news
• Worries and concerns
Martin L Purschke, Computing meeting 5-4-1999
Success Stories (DCH)
• We went the “whole length” to set the DCH folks up with their monitoring in the Counting House
• Before that they would take data, transfer it to RCF, fire up STAF, look at the data,…• Now: Take data, look at the results right away. • Doesn’t sound too tricky, we have done something like that, well, in April 1997, but here...
• we have wrapped STAF modules, PHOOL, PhHistogramFactory, basically the whole offline
environment• shared library versions have to match• both sides (RCF and ONCS) have their own set of version constraints (ROOT version, egcs
versions, libstdc++, and so on• wrapped STAF and “big-event” DD are true memory hogs (had to tweak the machine setups,
0.5GB virtual memory)
• The routine operation is more like near-line monitoring, analysis is done from a file from a
very short run• Tassilo and I funneled those data through a DD pool, no problem.• No difference whether you read from file or from a DD pool.
Martin L Purschke, Computing meeting 5-4-1999
DCH monitoring
Drift time distributions for X and stereo (UV) wires...
…taken online.
Martin L Purschke, Computing meeting 5-4-1999
Online and Offline environment (DPM & MLP)
• We want to arrive at a state where the user can run his or her programs in RCF and the
Counting house without re-linking. • We give the “general login script” another shot• several failures to get that in the past years• stop the proliferation of account-specific “do-it-all” .login files which are hard to maintain• source one script which sets up the environment for you, anywhere (*)• centrally maintained, changes picked up by everyone right away, no more stale paths• allow a standard Redhat Linux box without root access (but with afs) to use this• in the counting house we will use local copies of most software, no hard dependence on AFS• script will adapt to local software, use AFS distribution else• after executing that script, you should be able to run the analysis software no matter where
you are.
(*) we concentrate on Linux and Solaris for now
Martin L Purschke, Computing meeting 5-4-1999
ONCS Software highlights
• ONCS “Run Control” is now the de-facto standard for taking data in the counting house
• gives you “one-window operation” of DAQ, timing, DD system, start/stop
• gives you fancy scripting capabilities
• has become very robust over the past couple of weeks
• DD pool can handle “huge events” (more a sysadmin thing)
• DD pool interface is now the right “policy” by default (right for online monitoring)
• new account “phnxoncs” to run standard DAQ, phoncs account used to develop DAQ
• lower the chance to screw up
• give the right (todays’s) environment for the not-so-proficient user
• Some updates in Event library
• NEW/PRO scheme a la CERNLIB in place
Martin L Purschke, Computing meeting 5-4-1999
DD pools as I like them
ddEventiterator
DD Pool
dpipe
data file
testEventiterator
DD Pool
data file
New utility
MDC2 data file
DD Pool
DD Pooldata file
dpipe dpipe
ddEventiterator
ddEventiterator
Worked very reliably, shuffled 10GB through
Martin L Purschke, Computing meeting 5-4-1999
Objectivity news
• We (DPM & MLP) successfully set up a federated database with two “autonomous
partitions” spanning RCF and the counting house.• An autonomous partition is a part of the federation which is independent from other
partitions. • It has its own lock server• It has its own host machine(s)• In normal running, the partitioning is invisible (unless you want to know). • In case of a failure, the partitions will continue to function for processes only accessing
data within one partition• This is the case for most DAQ processes (run control, etc), so if RCF is down, it won’t
affect the data taking (and vice versa, but when are our machines ever down?)
It’s a major step forward towards doing away with a lot of ASCII-based configuration files.
Martin L Purschke, Computing meeting 5-4-1999
Database time stamps
We have talked about time stamps for database entries (and the time tags in the Event headers) in the past. So far we didn’t have a good common solution.
A tag in the database should give you the time an entry was made somewhat accurately, but time isn’t good enough to identify individual events from the DAQ. You will need that capability to drop individual events from the analysis later, or to maintain a “hit list” of your favorite events.
We will have a “composite” of time tag and run/event number for identifying events, and time only for things like HV readings, temperature readings, time a FPGA code was used, and so on.
Our “time” supports comparisons, time windows, <, >, ==, +, so it’s easy to find out whether an “event” is within a certain time window (such as validity range for a calibration) or not.
And the best: Offline and online use the same time format!
We will use that on Unix, NT, and VxWorks.
Martin L Purschke, Computing meeting 5-4-1999
And the winner is: the VMS time format
26-APR-1999 14:51:50.00 44318551100000000 Mon Apr 26 10:51:50 1999
6-AUG-2034 14:51:50.00 55452055100000000 Sun Aug 6 10:51:50 2034
VMS date (CH) ticks Unix date(+timezone)
Why VMS? It is a well-known, widely-used (also outside VMS) format
and at least one of us is a VMS nostalgic :-)
100 ns granularity, 64 bit tick counter since “Smithsonian base date” 00:00 17-Nov-1858
Unix (Posix): granularity 1s (Unix date, etc)
We won’t come close to making use of the 100ns granularity now
but we might be able to do better (NT clock, correlate accelerator ticks with time ticks, a common time base, something) in the future and will be ready
64 bit integer ready to be fed to conversion routines, and then to ctime, etc
Martin L Purschke, Computing meeting 5-4-1999
Y10K and Y31K bugs
COMPONENT: SYSTEM TIME OP/SYS: VMS, Version 4.n
LAST TECHNICAL REVIEW: 06-APR-1988
SOURCE: Customer Support Center/Colorado Springs
This base time of Nov. 17, 1858 has since been used by TOPS-10, TOPS-20,and VAX/VMS. Given this base date, the 100 nanosecond granularityimplemented within VAX/VMS, and the 63-bit absolute time representation (thesign bit must be clear), VMS should have no trouble with time until:
31-JUL-31086 02:48:05.47
At this time, all clocks and time-keeping operations within VMS willsuddenly stop, as system time values go negative.
Note that all time display and manipulation routines within VMS allowfor only 4 digits within the 'YEAR' field. We expect this to be correctedin a future release of VAX/VMS sometime prior to 31-DEC-9999.
Martin L Purschke, Computing meeting 5-4-1999
New Manuals
Over the past few weeks, we have beefed up the manuals for the DD, Message system, and, believe it or not, the Event library. Html and postscript, mostly automatically generated. Fun reading.
Has some real-life examples, in-depth reference information, a must for everybody analyzing data!
we now return to our presentation.
Martin L Purschke, Computing meeting 5-4-1999
Root status (multithreading)
At the FNAL workshop I agreed to re-visit the “multi-threaded Root” issue.
Some progress: I was able to run ROOT with 2 threads.
• only for compiled code (that’s ok)
• heavy use of mutex’es because most ROOT classes are not thread-safe
• this is just the beginning, far from a working system
• MLP2’s socket classes and shared memory are fallback solutions, not very
efficient.
I see people roll their own -- better help us! I had to put that project on the back
burner for the past two weeks.
Martin L Purschke, Computing meeting 5-4-1999
Worries and concerns
• Too little coordination with the online monitoring efforts. The “tool
sharing”/common solution concept doesn’t catch on (what little there is is all
by the same author). • We see people roll their own monolithic solutions -- that will hurt us badly
soon. Instead, better use your time to work with us and make something
which more than one subsystem can use.• Very little time contingency, when things like the hacker incident happen,
other projects are delayed. • Number of machines: It’s crowded, but far from being enough CPU power.
We need more over time. • Space: where do we put more terminals? • Memory: most Linux machines have 64Mb. Offline sees the minimum at
256, better 512. We should upgrade as soon as we can. Institutional
contributions, anyone?