TCS Events, the Data Dictionary, and Alarms (oh my)
Michele, Chris, and Doug
Version 1B
25 April 2014
Overview
This is not a presentation on architecture or design of an alarm handling system. This presentation discusses the state of our system today with respect to Events, and how the ECS is being cleaned up and uses the data dictionary for alarms. ECS can be used as a model for where we can see ourselves going in the future in terms of alarm handling.TCS Events – Overview and purposeData Dictionary – Overview and purposeAlarm Handler – How can we get here with what we already have at our disposal?
Software Group Presentation 2
25 April 2014
High-level Review of Events
Events are indicators which can be used as alerts in real-time or as tracers for post-mortem analysis of telescope actions.
In real-time, events are designed to aid the Telescope Operator (via the LSSGUI and the message boxes on all the TCS GUIs) and …
… the Observing Astronomer (via the synchronous return information to the instrument software imposed telescope commands) regarding a situation. This is only true if the event is packaged in the command return object.
For post-mortem analysis, events represent clues as to the state of the system at a particular instant.
Software Group Presentation 3
25 April 2014
Event Feedback in TCS GUIs
Software Group Presentation 4
25 April 2014
Event Feedback in Text File
Software Group Presentation 5
25 April 2014
High-level Review of Events
Every TCS subsystem defines their own events. Events can be pre-defined in XML or instantiated at
run-time as needed. Existing events can be modified at run-time (i.e.,
update LogString and associated parameters). All client commands should have “bookend” events:
started accompanied by complete/warning/failed. (e.g., psf.command.setZernikes.started)
Single-shot events are issued for some circumstance of particular interest. (e.g., pcs.command.setNSEphemerisTarget.extrapolation)
Software Group Presentation 6
25 April 2014
Event Definition Example in XML
Software Group Presentation 7
25 April 2014
Event Definition Example in Code
Invocation of call:
Supporting method:
The “LogString” is built on-the-fly with all the necessary parameters, and the default priority is OK (5). There is no easy way to know the names and how many events the GCS will generate.
Software Group Presentation 8
25 April 2014
Event Logging
Software Group Presentation 9
25 April 2014
Event Characteristics
Essentially isolated messages Not coupled except by convention
(e.g.,started/complete) Do not maintain any state (i.e., do not “latch”) Issued typically to indicate an unexpected or negative
transaction (i.e., there is no “I am happy again” counter-event)
Software Group Presentation 10
25 April 2014
Event Issues
Not fully implemented across all TCS subsystems Inconsistent in implementation (XML vs instantiation
in code) Inconsistent across subsystems in terms of priority
settings and associated meaning
Priority Color Meaning
1 red error
2 yellow warning
3 green Ok?
4 cyan Ok?
5 white Ok?
Software Group Presentation 11
25 April 2014
Event Clean Up
Implement in all subsystems Does it matter if some are in XML and others are
generated on the fly? Priorities 3 - 5 need to be better defined for use
(5=started/complete, 4=informational, 3=OK, or ???) Priority/Color will need to be reconciled with Data
Dictionary Severity scheme (discussed later) Clarify the wording as any particular event may be
packaged up as a response to an observer command.
Software Group Presentation 12
25 April 2014
Event Information
Reference document: 481s505
Presentation: wiki.lbto.org/bin/view/SoftwareProducts/EventSubsystem
Software Group Presentation 13
Data Dictionary
The data dictionary is a collection of variables representing the state of the TCS at a particular moment in time.
The TCS GUIs mine the data dictionary for the values represented on the GUI.
Variable datatypes are: bit, bool, char, uchar, short, ushort, int, unit, long, ulong, float, double, and string.
Every TCS subsystem defines their own variables. Only the TCS subsystem that owns a variable can
write to that variable. Any TCS subsystem can read any variable.
25 April 2014 Software Group Presentation 14
25 April 2014
DD realization in DDViewer
Software Group Presentation 15
DD Definition in XML
25 April 2014 Software Group Presentation 16
DD Accessing Variables
There are two ways to access DD variables: Gtype objects and SetValueInterface objects. I will only discuss the Gtype objects here. Below is an example of getting a DD value, getting the values associated with the lower and upper limits, and setting a DD value.
25 April 2014 Software Group Presentation 17
DD Characteristics
Each DD variable is an independent entity – there is no concept of a set of information (e.g., coordinates RA and Dec) though arrays are supported For entries which are updated at a high rate as in this
example, the Dec value may not be from the same timestamp as the RA value
There is no timestamp associated with each entry, but some subsystems have “grouped” data and there is a timestamp for the group
Under the assumption the subsystems keep their DD values up-to-date, then the variables always reflect current state.
25 April 2014 Software Group Presentation 18
DD Clean up
Not all DD items have associated limits defined in the XML.
Whether or not the limits of a variable are defined in the XML, not all subsystems use the mechanism for obtaining these values and then using this information for limit checking in the subsystem code.
Should the above items be addressed with the understanding the subsystem may need greater flexibility for limit checking than what can be achieved with these static values?
25 April 2014 Software Group Presentation 19
DD Setting Variables
Reference document: 481s504 Presentation:
wiki.lbto.org/bin/view/SoftwareProducts/ReflectiveMemory
25 April 2014 Software Group Presentation 20
Leveraging What We Have
After some discussions (Doug, Chris, and Michele) and because of state information, the data dictionary best lends itself to the idea of an annunciator panel (at the least) or an alarm handler (at the best).
25 April 2014 Software Group Presentation 21
25 April 2014
Using the ECS as a prototype
Leveraging the data dictionary for alarms: Allows for a better and more robust implementation of the
breadcrumb and rollup currently done by the ECSGUI Provides a model (ECS subsystem) for the remaining TCS
subsystems
The above steps allow us to produce an annunciator panel for the TCS. We can go further and …
Export the data dictionary items (as is done for FACSUM via the DDS) to an external system
Software Group Presentation 22
25 April 2014
ECS as an Example
ECS is comprised of a number of subcomponents, which are comprised of subcomponents … This subsystem is naturally represented in a hierarchical manner.
Software Group Presentation 23
ECS as an Example
25 April 2014
The mirror ventilation itself is comprised of a number of subcomponents which are depicted here to the lowest level “device”. Each device has an associated severity flag depending upon its PLC state. The next higher-level group also has an associated severity flag equal to the worst or highest level of severity among its constituents. This rollup continues until the top of the hierarchy is reached.
The current severity flag values are: error = 1, warning = 2, ok = 3, info = 4, debug = 5, and unknown = 6.
This scheme is easily exploited by the ECSGUI to color its navigation buttons and “eyebrow” in order to create the breadcrumb trail.
We should reconcile the DD severity flag levels (and colors used by the GUIs) with the event priorities as both facilities will be used.
Software Group Presentation 24
ECS as an Example
ecs.severity = 1 ecs.mv.severity = 2
ecs.mv.heatExchangers.severity = 2 ecs.mv.heatExchangers.hx0401.severity = 2 ecs.mv.heatExchangers.hx0402.severity = 2 ecs.mv.heatExchangers.hx0403.severity = 3 ecs.mv.heatExchangers.hx0404.severity = 3
ecs.dampers.severity = 1 ecs.dampers.dp0405 = 1 ecs.dampers.dp0406 = 3 ecs.dampers.dp0407 = 2 ecs.dampers.dp0408 = 3
25 April 2014 Software Group Presentation 25
Considerations
The ECS example only addressed states of error, warning, and OK as it was based upon hardware. What about analog values and their associated lower and upper limits?
Should the third-party package determine when limits have been violated?
What if the limits are not firm, but rather they are based upon some real-time computation?
In order to keep the specific TCS GUI and the alarm handler synchronized (which is a must), as well as retain flexibility when the lower and upper limits must be computed in real-time based upon some system variable, the subsystem should determine when a limit has been violated. This means providing the third-party package only with severity flags. All the real intelligence is retained in the subsystem (at least for the TCS).
25 April 2014 Software Group Presentation 26
ECS as an Example
Special states suggested by users:Hardware deliberately put into a non-working state so techs want to know about it at the “low-level” but do not want the condition to propagate to the high-level (may be temporary)Hardware deliberately put into a non-standard state (manual vs automatic) permanentlyThere may be other special cases
25 April 2014 Software Group Presentation 27
Severity Flags
If all TCS subsystems implement this scheme, we will not only have a robust annunciator panel, but also a path to exploit existing alarm hander software packages. Admittedly, there is at least code needed to convert our data dictionary information into the format preferred by the third-party package.
25 April 2014 Software Group Presentation 28
25 April 2014
Characteristics of Alarm Handlers
Bring the issue to the attention of Mountain personnel Allows for the acknowledgement of alarm (basically someone
has taken ownership of the alarm – fuzzy to me, enforcement?) Provides guidance for actions to pursue via pop-up or URL Provides for processes to be triggered when a particular
transaction occurs (open a subsystem GUI?) Accommodates suppression via filters (alarms held off in the
handler until the entity is in the alarm state for a specified duration, etc.)
Provides for logging of alarms and display of the log history (for TCS this is covered by the event log)
Presents the alarms in a graphical, hierarchical view (logical grouping) for easy digestion
Software Group Presentation 29
EPICS
The EPICS alarm handler can deal with literally thousands of alarms at a granular level. Since we will have already created a detailed set of severity flags in the TCS subsystems in support of the TCS GUIs, we have the flexibility to control how “deep” we want any portion of the alarm hander to be.
There are effectively four levels of alarm: invalid, major, minor, and no_alarm (though I also see error as the worst level).
Sound can be associated with the raising of an alarm – looks like only one sound accommodated (default: beep)
25 April 2014 Software Group Presentation 30
EPICS - Filtering
Filtering or use of masks controls when an alarm is displayed
if a device is in alarm for more than N seconds if a device enters into an alarm state from a no_alarm state
more than M times in N secondsif an alarm is even displayed on the handler (though it may still be logged)if an alarm must be acknowledged by the Telescope Operatorif an alarm is logged
25 April 2014 Software Group Presentation 31
Near term
Clarify the meaning of the event priorities Establish firm set of severity flags
Reconcile severity flag levels with event priorities Complete implementation of severity flags in ECS
Clean-up the ECSGUI to use the severity flags Re-think the ECSGUI message box which has always been
more of an Alarm box – address the Event LogStrings? Determine the best manner to map data dictionary items
into the ASCII database needed for EPICS – and do it!
25 April 2014 Software Group Presentation 32