Upload
skgadde
View
217
Download
0
Embed Size (px)
Citation preview
7/30/2019 2011 04 05 Safety and Reliability Patterns
1/16
7/30/2019 2011 04 05 Safety and Reliability Patterns
2/16
About safety and reliability
Reliability: a measure of up-time, availability or the probability of
successful computation
Often measured with, for example, MTBF
Safety: safe system does not incur too much risk to persons or
equipment.
Safety is distinct from reliability, however, safety systems must be
reliable. Patterns that are actually related to reliability are often
called safety patters.
Accident occur because oferrors orfailures Errors are (systematic and) always present but may not be visible all
the time.
Failures include, for example, bit flips and breaking of hardware, so
they are not always present in the system.
2
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
3/16
About errors and failures
Different means to handle errors and failures
Failures (random): homogenous redundancy (copies)
Errors: heterogeneous redundancy
Author: safety-critical system must contain and properly manage
redundancy.
what can be achieved with redundancy is reliability.
Ability to detect failures and enter safe-state.
Ability to continue providing the service even in presence of failures.
The safety and reliability patterns presented in the book are all
quite well-known and presented in numerous publications.
The patterns to be work-shopped are more related to developing
safety systems
Division of responsibilities between safety and basic control systems.
3
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
4/16
Ones complement pattern
Problem: how to detect data corruption for small set of data that
can be caused by, for example, EMI or heat.
Solution: the data is stored twice once in normal format and
once in ones complement format. When the data is read, the
ones complement format can be inverted back to normal andcompared to the original.
If the values do not match, error processing can be initiated.
No way to decide which one is correct
- Uses twice as much memory for storage- Small performance hit.
4
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
5/16
7/30/2019 2011 04 05 Safety and Reliability Patterns
6/16
CRC pattern
Problem: how to detect data corruption in large data sets that can
be caused by, for example, EMI or heat.
Solution: a CRC value is calculated from the data and stored in
addition to the actual data. When the data is read, the CRC value
can be re-calculated and compared to the original CRC. If the values do not match, error processing can be initiated.
- Good detection of single and multiple bit errors
- Small performance hit.
6
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
7/16
7/30/2019 2011 04 05 Safety and Reliability Patterns
8/16
7/30/2019 2011 04 05 Safety and Reliability Patterns
9/16
Smart Data pattern
9
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
10/16
Channel pattern
A channel is an architectural structure that processes data from
raw acquisition through a series of processing steps to physical
actuation. (end-to-end processing)
A basis for a set of patterns including, for example, protected
single channel pattern, Dual channel pattern and more..
In the book, the author does not state explicit problem or
consequences because channel is treated as a base for the
other channel-related patterns.
10
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
11/16
Channel pattern
11
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
12/16
Protected single channel pattern
Problem: How to improve reliability without having to increase
development and hardware costs as much as with real
redundancy.
Solution: a single channel is used to process the data from
sensors to actuators. The reliability is enhanced through additionof checks at key points in the channel
- Not able to continue functioning in the presence of faults (but
may be able to detect faults and enter safe-state).
12
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
13/16
7/30/2019 2011 04 05 Safety and Reliability Patterns
14/16
Dual channel pattern
Problem: How to improve reliability and provide protection
against single-point faults.
Solution: reliability can be improved by offering multiple channels.
If the channels are identical (homogeneous redundancy), the
pattern can address random faults. If the channels use differentdesign or implementation, the pattern can address both random
and systematic faults. Depending on which pattern is used, it may
enter safe-state or switch to another channel when a faults is
detected.
- Logic is needed to manage the channels and to determine which
will be active.
- The logic may be a single point of failure?)
- Costs
14
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
15/16
Dual channel pattern
15
4/6/2011
7/30/2019 2011 04 05 Safety and Reliability Patterns
16/16
More channel patterns
Homogeneous redundancy pattern
Uses two identical channels
Heterogeneous redundancy pattern
Uses dual channels of different designs or impelementations
Triple modular redundancy (TMR) pattern
Uses three channels of typically identical designs and a voting
mechanism to decide the actual output.
Can continue to deliver services in the presence of a fault, providedthat the fault is isolated within a channel.
16
4/6/2011