Upload
brice-walker
View
216
Download
0
Embed Size (px)
DESCRIPTION
Setup requirements - general UR – Use of Configuration DB for: description of a partition, dependencies, setup/recovery actions and error conditions. UR005,006 – initialize and shutdown a partition UR007 – operate on several partitions in parallel UR008 – initialize and shutdown a single DAQ component UR009 – verify h/w used by the partition UR010 – quite, trace and profiling 'output modes' UR011 – interactive and automatic operational mode UR012,013 – user GUI, reflecting the state of partition UR014 – GUI for browsing log information produced by a partition UR015 – GUI for editing setup/shutdown dependencies
Citation preview
Setup/RunControl
Andrei and GiovannaAndrei and Giovanna
Setup Requirements Setup Requirements (Feb 2002, by Sergei)(Feb 2002, by Sergei)
• DAQ Active state: DAQ is ready to accept RC commands from Operator
• DAQ Inactive state (assumed that all h/w in installed and properly connected, PCs loaded etc.)
• Setup and Shutdown – procedures to move DAQ from Active to Inactive state and back.
Setup requirements - generalSetup requirements - general• UR001-004 – Use of Configuration DB for: description of a partition,
dependencies, setup/recovery actions and error conditions.• UR005,006 – initialize and shutdown a partition• UR007 – operate on several partitions in parallel• UR008 – initialize and shutdown a single DAQ component• UR009 – verify h/w used by the partition• UR010 – quite, trace and profiling 'output modes'• UR011 – interactive and automatic operational mode• UR012,013 – user GUI, reflecting the state of partition• UR014 – GUI for browsing log information produced by a partition• UR015 – GUI for editing setup/shutdown dependencies
User Requirements - recoverUser Requirements - recover
• UR016 – recognize and report failure with a DAQ component
• UR017 – try to recover from a failure, if recover procedure is defined
• UR018 – use DVS to diagnose the problem if recover procedure is not defined or does not work
• UR019 – in case of unrecoverable failure setup should stop and ask user for shutdown option
User Reqs: logging and sequrityUser Reqs: logging and sequrity• UR020 – log information produced by DAQ components
• UR021 – log all performed setup/shutdown actions, qualified with timestamps
• UR022 – log all failures and recovery actions, including DVS output
• UR023 – use of user access control
• UR024 – setup should deny initialization of resources allocated from another partition
• UR025 – editing the setup configuration must be protected by user access rights
OkCustomizable coreIr004
OkInterface to be imported in all appsIR002
OkSkeletonIR001
Difficult to measure…Performance requirementsPR001-003
Partially okController failure no effect on controlled items
RR001
Limited reconfiguration possible today. Quality of service concept missing
Configuration and configuration changes
UR401-405
Many hooks, no uniform approachError reception and handlingUR301-307
Ok, extra synchronization to be implementedState Transitions and statesUR201-204
Ok, shared on setup, DSA and LCProcess managementUR101-105
Not implemented.Lock resourcesUR012
Partially okRun interactivelyUR011
Ok in LCQueue commandsUR007
To be done (except shutdown…)Interrupt commandsUR006
OkRefuse commandsUR005
OkDistribute commandsUR004
Ok in SetupTest before usingUR002
Ok1-to-1 with SegmentUR001Controller RequirementsController Requirements
Controller RequirementsController Requirements
Setup: implementation
Implemented functionalityImplemented functionality
• replacement of play_daq, based on DVS and ConfDB
• can make partial setup and re-setup (“recover” setup) if a Segment is attached
• restarting of failed applications
• IDL interface, used from IGUI
• IGUI embeds setup panel which reflects status of infrastructure
Setup IDLSetup IDL boolean status () ; void get_components (out Component comp) ; long subscribe (in callback cb) ; void unsubscribe (in long handler); ComponentStatus setup_component (in string component_name); void reset_component (in string component_name); void abort () ; void shutdown_component (in string component_name); ComponentStatus setup_partition (); boolean shutdown_partition ();
Setup logicSetup logic
Setup:setupComponent()
DVS:verifyComponent()
PMG:startApplication()
- shares verification rules with DVS- extends verification with setup/recovery via PMG (DVS core + PMG Client + more rules)
- implemented on the same CLIPS framework as DVS
Setup sequenceSetup sequence• setup script:
• verify database• launch setup_server for partition• launch IGUI for partition
• setup server (implementing IDL):• verify/setup global infrastructure• verify/setup partition IPC server• if Setup is already registered in the partition, exit• register in partition IPC, wait for commands from
IGUI• on command, verify/setup/shutdown infrastructure
The DSA
Diagnostic Supervisor Diagnostic Supervisor Application/Agent… Did you know Application/Agent… Did you know
it? it?
DSA: the historyDSA: the history
Diagnostics System
VerificationComponent
SupervisionComponent
DAQ Supervisor
DSA
?
Scope? Not completely clear but Scope? Not completely clear but that’s what it does…that’s what it does…
• Process management of all controlled applications (NOT online servers) defined in configuration database for a Partition, taking into account:– When to start/stop them– Dependencies in start/stop sequence– If they shall be automatically restarted when
they fail/die
ImplementationImplementation• Basically an “intelligent” PMG client:
– Navigate through Partition configuration– Subscribe to changes in configuration for Partition,
Segments, Applications, Resources, RunControlApplications (reload of DB only when in Idle state)
– Build dependency tree of apps to be started/stopped– Restart apps– If DSA crashes and is restarted it recovers to previous
state (true?)– Publish state in IS and send messages to MRS
The Online Run Control
ScopeScope
• Model system according to a finite state machine
• Distribute commands in the control tree (in sequence or in parallel)
• Propagate error state in control tree• Provide a c++ interface to subsystems to
implement transition commands
ImplementationImplementation
• Basically an intelligent command dispatcher:– Builds up list of children from configuration DB– Keeps track if child left/joined control tree– Publishes state in IS– Refuses illegal transition commands and all
commands while busy– Root Controller extension updates run
parameters, interprets “end of run” message, etc…
Implementation of DSA and RCImplementation of DSA and RC
• Very similar, based on the same core• Mixture of C++/Java and the CLIPS expert
system• 3 layer approach:
– Language bindings– Proxy– Rules
Language BindingsLanguage Bindings
• This layer is concerned with the integration to other Online Software components.
• Functionality provided by the components client API is wrapped and made accessible to the CLIPS language. The language bindings are grouped in external shared libraries. They can be loaded dynamically and provide their features on demand.
• It is foreseen to develop a comprehensive set of language bindings that encompass main Online Software functionality. Development of such language bindings require detailed knowledge of Online Software and the mechanism of CLIPS language bindings.
ProxyProxy• In this layer external entities, such as applications, run-controllers
or other entities of the TDAQ system, are modelled as objects.
• Typically these objects are proxy objects: on one hand they forward commands to an external application and on the other hand they mirror the state of the external application by subscribing some information. This feature makes the state of an external entity available to the expert system.
• It is foreseen to develop a set of prepared proxy objects for common uses in the online software. Development of specialised objects requires knowledge of the online software concepts and some knowledge of the CLIPS language.
RulesRules• In this layer the rules are defined which provide the logic for the
application. • The rules steer the interaction between the objects and drive the
application. • Development of specialised rules requires knowledge on the proxy objects
and some knowledge of the CLIPS language. It is expected that a detector or subsystem expert could customise actions on this layer to perform some system specific tasks.
• Example:1: (defrule do-something-on-controller-error2: (object (is-a RUN-CONTROLLER) (NAME ?name) (STATUS ERROR))3: =>4: (printout t ‘Controller’ ?name ‘is in error state’ crlf)5: )
AssessmentAssessmentGood
• Layering of “service tools” and “intelligence”
• Rules and facts can be changed without (re)-compilation
• IPC communication mechanism
Not very Good• Separation of process
management and run control (-> synchronization between DSA and RC states)
• No clear way of receiving errors and handling them
•Open for discussion:•How many rules can be added before a system becomes unmanageable?•Can we really expect users to play around with an expert system?•If we go for an expert system we will probably need to define a way for users to specify rules and facts which is independent of the expert system syntax. (if this happens, do that…)
Local Controller
ScopeScope
• Extend Online RC (in 2001 there was only a fixed c++ API to implement commands; expert system appeared in 2003) for ROS
• Serve as:– Configuration server– Process manager (very similar to DSA)– Command dispatcher (rc commands, operational monitoring,
event sampling)– Error handler
• Unique Interface between online software and ROS• Minimal constraints on controlled items (simple,
lightweight interfaces)
Historical EvolutionHistorical Evolution
• UR to have apps not using online software tools removed (evolution of platforms, better comprehension of system)
• Local Controller has become general purpose leaf controller (DF, HLT, ROD Crate DAQ)
• UR to control directly some hardware items instead of apps not used (DF, HLT and detectors prefer to work on their own apps, for easier debugging, standalone running, etc…)
ImplementationImplementation• Extends rc interface• C++ based, with runtime plug-in library for error handling• Extended use of RO infrastructure libraries (threads,
factory for dynamic loading of libs, smart pointers, queues, mutexes, conditional variables, error format/catcher/handler, …)
• Uses IPC for communication between apps and controller, PMG for process management, DB for configuration
• One direction for commands• Opposite direction for errors• Provides main loop for controlled apps• Controlled apps can run in interactive mode with special
command-line option
Controlled AppControlled Appmain { MySubSystemItem pippo(string name); ItemCtrl lucy(*pippo, interactiveMode,…); lucy->run(); exit(0);}
• ItemCtrl sets up path to receive commands and to send errors.
• Errors are all DFError objects and carry information for later handling.
• Failure of synchronous commands is reported by throwing an exception (DFError created from exception and handled like other errors).
AssessmentAssessment
Good• Direct process
management of controlled apps
• Easily customizable hooks for error handling
• Interface to controlled apps allows to run them interactively
Bad/Obsolete• Heavier than necessary
(done to satisfy initial requirements)
• Synchronous implementation of communication
• Within one transition command can only be dispatched to all controlled items in parallel
SummarySummary
• Large part of functionality requested is there, though scattered
• Clear need to rationalize – Interface to subsystems for:
• Error reporting and handling• Command distribution• Process management
– Configuration changes– Use of expert system– Use of DVS for tests