Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
UAS Risk Analysis using Bayesian Belief Networks: An Applicationto the Virginia Tech ESPAARO
Christopher G. Kevorkian
Thesis submitted to the Faculty of theVirginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Master of Sciencein
Aerospace Engineering
Craig Woosley, ChairJames Luxhoj, Co-Chair
Pradeep Raj
August 5, 2016Blacksburg, Virginia
Keywords: ESPAARO, UAS, Bayesian Networks, Risk Analysis
UAS Risk Analysis using Bayesian Belief Networks
Christopher G. Kevorkian
Abstract
Small Unmanned Aerial Vehicles (SUAVs) are rapidly being adopted in the National Airspace(NAS) but experience a much higher failure rate than traditional aircraft. These SUAVs arequickly becoming complex enough to investigate alternative methods of failure analysis.This thesis proposes a method of expanding on the Fault Tree Analysis (FTA) method to aBayesian Belief Network (BBN) model. FTA is demonstrated to be a special case of BBNand BBN can allow for more complex interactions between nodes than is allowed by FTA.A model can be investigated to determine the components to which failure is most sensitiveand allow for redundancies or mitigations against those failures. The introduced method isthen applied to the Virginia Tech ESPAARO SUAV.
Contents
1 Introduction 1
1.1 Introduction of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Research Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Method Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 OOBN Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 FMECA Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Literature Review 8
2.1 Fault Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Basic Probability of Failure Calculations . . . . . . . . . . . . . . . . 10
2.1.2 Branching out on the Fault Tree . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Minimum Cut Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Event Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Failure Mode and E↵ects Analysis . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
2.4 Failure Mode, E↵ects, and Criticality Analysis . . . . . . . . . . . . . . . . . 22
2.5 Bayesian Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Basic BBN Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.2 Software and Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5.3 Object-Oriented Bayesian Networks . . . . . . . . . . . . . . . . . . . 33
2.5.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Summary and Comparison of Methods . . . . . . . . . . . . . . . . . . . . . 36
3 Bayesian Approach to Risk Analysis 39
3.1 Current Research using Bayesian Methods . . . . . . . . . . . . . . . . . . . 39
3.2 Fault Tree as a Special Case of Bayesian Methods . . . . . . . . . . . . . . . 41
3.3 Fault Tree Comparison to Bayesian Network . . . . . . . . . . . . . . . . . . 41
3.4 Identifying Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Criticality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 FMECA representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 ESPAARO Case Example 54
4.1 Introduction to the ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Model Construction and Probability Elicitation . . . . . . . . . . . . . . . . 57
4.3 Conversion to an OOBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Identifying Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iv
4.5 Application and E↵ects of Mitigations . . . . . . . . . . . . . . . . . . . . . 66
5 Conclusions and Future Work 67
5.1 BBN for ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 OOBN for ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Sensitivity Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
v
List of Figures
1.2.1 Bayesian Augmentation Approach . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 (a) AND gate (b) OR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Fault Tree Outline (Source: adapted from [16]) . . . . . . . . . . . . . . . . . 14
2.1.3 Fault Tree example (Source: adapted from [34]) . . . . . . . . . . . . . . . . 16
2.1.4 Minimum Cut Set Example (Source: adapted from [34]) . . . . . . . . . . . . 17
2.1.5 Communication network (Source: adapted from [18]) . . . . . . . . . . . . . 18
2.1.6 Communication Network Fault Tree, Denote Mirror Blocks (Source: adaptedfrom ([18]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Event Tree Example (Source: adapted from [8]) . . . . . . . . . . . . . . . . 20
2.4.1 Risk Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Simple Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.2 Serial Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.3 Influence Diagram Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.4 Two Node Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.5 Probability Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.6 Conditional Probability of Influence Structures (Source: Adapted from [8]) . 31
vi
2.5.7 Wet Grass Example (Source: adapted from [26]) . . . . . . . . . . . . . . . 32
2.5.8 OOBN engine example as constructed in Hugin . . . . . . . . . . . . . . . . 35
3.2.1 Bayesian Expansion of Fault Tree . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Fault Tree Example (Source: [27]) . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Bayesian Net Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Two node system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.4 Probability Tree of Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.5 CPTs of BBN example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.6 System Remodeled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Non-Boolean Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Graph of Likelihood Multipliers (LMs) . . . . . . . . . . . . . . . . . . . . . 49
3.5.1 Bayesian Example with Mitigation . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.1 Modified NAVAIR Risk Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.2 Example FMECA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.0.1 KEAS Lab Runway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 ESPAARO Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 ESPAARO Strut Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.3 ESPAARO on the runway . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Control Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.2 ESPAARO Aircraft OOBN . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.1 Likelihood Multiplier Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vii
5.1.1 ESPAARO BBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 Autopilot Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.2 Control Surface Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.3 Propulsion System Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.4 Control System Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.5 Strut Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.6 Tail Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.7 Wing Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.8 Aircraft Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.9 Camera Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
viii
List of Tables
2.3.1 Example FMEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 Conditional Probability Table for Motor . . . . . . . . . . . . . . . . . . . . 29
2.6.1 Risk Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Component Probabilities (Source: [27]) . . . . . . . . . . . . . . . . . . . . . 42
3.4.1 Sensitivity to Parameter Variation . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 Likelihood Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4.1 Abbreviated Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Likelihood Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.1 Risk Mitigators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.1 Sensitivity Analysis Results for ESPAARO . . . . . . . . . . . . . . . . . . 76
ix
Chapter 1
Introduction
Unmanned Aircraft Systems (UAS) have become a vital part of the U.S military arsenal
and are quickly growing in commercial popularity. Unfortunately, all but the most advanced
UAS su↵er from remarkably high rates of failure. Decade old data shows that UAS systems
experience a failure rate nearly 100 times greater than manned military aircraft [35]. Even
though UAS su↵er such high rates of failure they have been operating more sorties than
manned missions in recent conflicts [1]. Small UAS designs tend to be very focused on
maintaining low cost, which drives the design to be particularly failure prone. Currently
many airframes are deemed ‘expendable’ or ‘frangible’ when utilized on closed ranges, and a
detailed system safety review may be unneeded if the system poses little risk to personnel or
equipment on the ground. With the recent proliferation of small UAS and the widespread
interest in the commercial sector, system safety must be addressed, as these systems may
now operate in civil airspace with people and property under their flight path and with civil
and commercial aircraft sharing the airway.
1
Large aircraft manufacturers invest substantial time and resources to address system reli-
ability; however, small UAS designers cannot a↵ord to commit similar time and resources.
New techniques need to be developed to assist the designers of smaller UAS to assess their
airworthiness and address possible problems before they occur.
One approach to evaluating an aircraft’s reliability is to assess the probability of failure
through fault tree analysis (FTA). A fault tree is a directed acyclic graph consisting of
boolean blocks representing components interacting in a system. In failure analysis, the
“top event” of the fault tree is an overall system failure (e.g., loss of an aircraft or mission
failure). Failures of underlying subsystems cascade up the tree and may or may not result
in the top event, depending on the fault tree structure (reflecting component redundancy,
etc.). Fault tree analysis is an e↵ective tool for predicting system unreliability, however it
requires knowledge of component failure rates. This information is very rarely known for
small UAS making FTA less e↵ective for assessing safety for this class of aircraft. Also, the
tree structure is restrictive; for example, it can be di�cult to represent interdependencies
between “parallel” events in the structure.
1.1 Introduction of the Problem
The parts and processes used to manufacture many small scale UAS platforms have been
spun o↵ from the hobby community. While many of these processes are widespread and have
been iterated in a cost competitive market landscape to find e↵ective solutions to problems,
little reliability data exists for such products. The emphasis on low prices and quick time
to market, has led to questions about reliability and quality of the product. The sparsely
available data makes any consistent reliability analysis di�cult [5]. US military UAS are
2
an exception; the military requires that components meet MIL-STD-100C. In practice, it is
often desired to keep the cost of UAS low and this focus on cost reduction has predictable
consequences for system safety and reliability. The need to keep the cost low while the
diversity and short product life cycle of UAS designs have created a unique problem for
system safety and reliability. This problem is highlighted by the fact that many UAS systems
are seen as disposable, which goes against conventional reliability practices. Unlike aircraft
UAS lifecycles may only last a few months. Aircraft certification requires a very well tested
and known platform. A certification process for UAS is still developing for the commercial
industry, however has begun to be outline in the FAA Part 107 [4]. As these systems begin
to be utilized in the National Airspace System (NAS) such standards will begin to solidify.
A standardized and adaptable system must be found to ascertain, and then improve, the
airworthiness of small UAS at reasonable cost.
UAS su↵er from a relatively high mishap rate compared to manned aircraft, which is a
reason the FAA has been reluctant to allow their use, or at least their commercial use, in
civil airspace. At this time it is unreasonable that small UAS will reach a level of safety
equal to large and expensive aircraft but an equivalent level of safety must be determined
[12]. Once those new thresholds are determined techniques that are appropriate to small,
lower-cost aircraft must be developed and applied to ensure safety.
The current approaches to system safety commonly utilized by air safety authorities, such as
the Federal Aviation Administration (FAA) and NAVAIR are very comprehensive for manned
aircraft. For instance the FAA has adopted the Safety Management System (SMS) which is a
formal, top-down systematic approach to identify risk, including the necessary organization
structures, policies, and procedures [6]. The SMS entails four main components: Safety
Policy, Safety Assurance, Safety Risk Management, and Safety Promotion. Of particular
3
interest in this work is the Safety Risk Management section, which briefly describes various
techniques of identifying and evaluating hazards. The most widely used techniques include
Failure Mode and E↵ect Analysis (FMEA), Failure Modes E↵ects and Criticality Analysis
(FMECA), and Fault Tree Analysis (FTA). These techniques will be discussed in more detail
in Chapter 3.
The Navy introduced similar guidelines to the FAA in managing risk in their NAVAIR
Risk Assessment Handbook [9]. These guidelines provide a focus not only for assessing and
managing the risk to the system, but also provide a thorough framework for risk reporting
and mitigation. A supplement to the Risk Assessment Handbook, NAVAIRINST 5000.21A,
provides more specifics on implementation of risk analysis as it pertains to the acquisition
process. These guidelines still base their system risk assessment on techniques such as FTA
and FMECA. Similarly the FAA has a FTA and FMECA based approach to risk assessment.
The most commonly used method for a system level of risk analysis is a Fault Tree Analysis.
This method is very well understood and has historically served very well. However the
application of this method imparts a series of restrictions on how systems are modeled.
Many systems have complex interactions between hazards and their e↵ects. A fault tree
approach to modeling these hazards requires that all hazards have unique e↵ects and be
statistically independent of one another, as well as only two condition states [28].
1.2 Research Objectives
To overcome the limitations of the fault tree, a Bayesian Belief Network (BBN) is used. A
BBN is a graphical representation of a probabilistic dependency model and can be seen as
4
generalized form of a fault tree [16]. A direct comparison is performed between a fault tree
and a BBN. A comparison is made with an expanded BBN to investigate the features and
their benefits of a Bayesian approach to system risk assessment.
Bayesian networks can consist of subnetworks represented as objects to create an Object Ori-
ented Bayesian Network (OOBN). Using an object oriented approach yields system represen-
tations that are more accessible to a user or decision-maker. Object oriented representations
also encourage modularity, which supports reuse in future development e↵orts.
The process of building a Bayesian Network is similar to a FMECA/FMEA in analyzing
failure modes and the higher level e↵ects of those failures. The criticality analysis of FMECA
can be augmented with the BBN to provide realistic system risk reduction of mitigations as
well as providing cost/benefit to those decisions. The approach is illustrated in Figure 1.2.1.
Figure 1.2.1: Bayesian Augmentation Approach
5
1.3 Research Tasks
1.3.1 Method Comparison
After the methods of risk assessment are introduced, the BBN fundamentals are explained.
A comparative assessment between BBN and FTA is employed to demonstrate how FTA can
be recovered with a BBN approach to the same problem. Both are directed acyclic graphs
which makes their comparison natural.
1.3.2 OOBN Representation
With a comparative assessment between FTA and BBN, a broader structure of Bayesian
Networks is used to show how large and complex system can be fully captured without
a top level diagram showing this complexity. An OOBN is used to model a UAS system
developed at Virginia Tech. An OOBN allows subnets to be created that represent individual
subsystems which are fed into the top level system.
1.3.3 FMECA Augmentation
A prototype method is introduced to support decisions made to mitigate the risks of the
system. This method illustrates the use of an OOBN on an actual UAS and will augment
the current FMECA approach. The natural method of creating a FMECA and determining
the severity of hazards lends itself to the Bayesian approach that can also show the e↵ects
of hazards and any mitigation that may be introduced to reduce them.
6
1.4 Overview of the Thesis
Chapter 2 provides a literature review on the current methods and motivations for small
UAS risk analysis as well as their advantages and limitations. Chapter 3 progresses to
a detailed introduction of the method referenced in Figure 1.2.1 using a simple fault tree
analysis introduced by Murtha [27]. Chapter 4 includes a case study using a Virginia Tech
UAS to populate an OOBN and then perform risk analysis. Chapter 5 summarizes this work
and suggests future areas of research.
7
Chapter 2
Literature Review
The purpose of risk analysis is to determine likelihood and consequences of undesirable events
in a system. Knowledge of a system may be obtained by analysis or experimentation, but
knowledge of the system will always remain uncertain. In the vast majority of cases it is
not possible to acquire all relevant information to remove all elements of system uncertainty.
However, risk analysis must be performed even if all relevant information about failure
probabilities is not available. Vesely [34] notes an important observation that to make a
decision, a perception of reality must be created, and that this perception must be as close
to actual reality as possible in order to make the best decision possible. More information
can always be learned about a system through analysis or experimentation but a method to
represent it be must be as accurate representation as possible.
There are a variety of methods to assess risk of an undesired event in a system, both quan-
titative and qualitative. Qualitative risk assessment uses expert judgment to evaluate the
probability and consequence of given events. If subject matter experts are available this is
8
often viewed as a su�cient method to analyze risk, albeit subjective. A quantitative ap-
proach to risk analysis relies on probabilistic and statistical methods, including databases
that have identified probabilistic information for failure rates and their e↵ects. A quantita-
tive approach is generally regarded as a more detailed approach if the information is available
[8].
The decision of which method to create a system risk model is determined largely by the
availability of probabilistic data and the level of analysis required. Qualitative methods o↵er
analysis based on subjective process may result in large errors. A quantitative, or data driven,
analysis generally provides a less subjective understanding individual judgement but requires
high quality data for more accurate results. Usually a combination of both quantitative and
qualitative analyses is utilized [8].
When it comes to selecting a means of safety analysis, the SMS [6] recommends considering:
information available, timeliness of the information, time required for analysis, and tools
that will provide the appropriate approach for fully identifying the hazards and their e↵ects.
In section 3.3.4.3 of the SMS [6] it recommends to “select the methodology that is most
appropriate for the type of system being evaluated”. This statement allows the method that
is best able to capture hazards and e↵ects to be chosen.
2.1 Fault Tree Analysis
Fault Tree Analysis (FTA) is a method of performing quantitative risk analysis and was first
introduced as a safety assessment tool of the Minutemen Intercontinental Ballistic Missile
systems by Bell Labs in the 1960s. Fault trees are graphical models created by a deductive
9
top down process combining events and occurrences tracing from some top level failure
event back to its causing factor. It is typical for only top level events that are deemed to
be catastrophic to be included. Fault trees are often constructed by identifying a top level
failure and then deducing its cause by a combination of factors. Root failures can combine
to result in a higher level failure. This cause and e↵ect logic is represented graphically by
a series of standardized symbols introduced by the Nuclear Regulatory Commission [34]. It
is important to note that the fault tree approach is not an exhaustive list of factors leading
to failure, but only those failures that are determined to be pivotal in a particular case of
failure [31].
2.1.1 Basic Probability of Failure Calculations
Fault Trees are built up using a standardized system of symbols. Boolean AND and OR
gates aggregate the lower level events to connect events to their higher level e↵ects. Figure
2.1.1 shows an example of an AND gate (a) and an OR gate (b). The top event is represented
as T, while A and B are both base components. In Figure 2.1.1a, T can be said to occur
only if A and B occur. Applied to risk analysis, this figure represents a redundant system
for which the failure of a single component (A or B) does not result in a failure of the system
(T). The failure probability of this simple system can be represented as
pT = pA ⇤ pB (2.1.1)
where pA and pB are the failure probabilities of the corresponding components. Figure 2.1.1b
shows
10
Figure 2.1.1: (a) AND gate (b) OR gate
an OR gate that represents the outcome T if A or B occurs. It is worth noting that an OR
gate can be represented in two ways, depending on A and B are mutually exclusive. If two
events are mutually exclusive then either A or B could occur, but not both. As an example
a coin cannot be both heads and tails in a single toss. In this coin case, the probability of
the system is represented as
pT = pA + pB (2.1.2)
When A and B are not mutually exclusive the system can be represented as
11
pT = pA + pB � (pA ⇤ pB) (2.1.3)
This extra term, pA ⇤ pB, in Equation (2.1.3) comes from the chance that both A and B
fail simultaneously. Without this extra term the probabilities would be doubly counted.
According to the NASA FTA Handbook [31], if the probabilities of either the A or B terms
are below 0.1 it is acceptable for the second term in Equation (2.1.3) to be ignored. This
simplifies the OR gate in Equation (2.1.2), and is known as the rare event approximation
[31]. The rare event approximation is certainly not needed for the simple example given
above. For analysis of a complicated system model, however, the approximation can save
considerable computation time.
A system of n components connected through an AND gate can be seen in Equation (2.1.4)
where C is a cause.
pT =nY
i=1
p(Ci) (2.1.4)
Equation (2.1.5) represents an OR gate of larger size than a two component system.
pT = 1�nY
i=1
(1� p(Ci)) (2.1.5)
The rare event approximation normally allows Equation (2.1.5) to be simplified down to
Equation (2.1.6) [13].
12
pT =nX
i=1
p(Ci) (2.1.6)
2.1.2 Branching out on the Fault Tree
It is common for fault trees to be built up using aggregates called “contributors” that intro-
duce terms similar to pT derived from Equations (2.1.4) or (2.1.6). This system probability,
pT , can then be fed into the next calculation iteratively instead of forming a long alge-
braic equation to represent the entire system [16]. This method of substitution allows for
contributor probabilities of failure to be known, providing subsystem probabilities of failure.
Figure 2.1.2 shows an outline of a generic fault tree, and it can be determined that pA = p1⇤p2
and that pB = p4 + p5 so long as p4 and p5 are less than 0.1. Therefore the equation that is
represented by the fault tree of Figure 2.1.2 can be expressed as
pF = pA + p3 + pB
instead of the longer form and more accurate form of
pF = (p1 ⇤ p2) + p3 + p4 + p5
Thus, one may decompose the computation of the top level failure probability into summable
elements. Equations (2.1.6) and (2.1.4) are used in a series of steps to compute the overall
failure probabilities.
13
Figure 2.1.2: Fault Tree Outline (Source: adapted from [16])
14
2.1.3 Minimum Cut Set
Another common technique used to reduce the size and complexity of a fault tree are min-
imum cut set. An illustration of a minimum cut set is shown in Figure 2.1.4. A cut set is
a series of basic events that combine for the higher level event to occur. Then a minimal
cut set is the smallest combination of component failures that if occurring cause the top
event to occur [34]. By definition this implies the “smallest” combination of failures, which
could be a one component failure, or two comprising a double component failure. It is often
desirable to determine the minimum cut sets from a fault tree to best ascertain the e↵ects
and interactions of basic failures. The Nuclear Regulatory Commission Fault Tree Handbook
[34] has an example of this process using Figure 2.1.3, where T is the top level event and (A)
(B) (C) are basic events; Ei are seen as contributors. Such a tree is unnecessarily complex
and has multiple instances of the same basic events which adds unnecessary mathematics.
One way to reduce the complexity is through this minimum cut set method.
This fault tree is an example of a dual pumping system for a nuclear reactor and represents
its operation. To represent this fault tree algebraically the following equation is obtained
from the fault tree.
T = A ⇤ C + (B ⇤ C) + (C ⇤ C) + (A ⇤B) ⇤ A+ (A ⇤B) ⇤B + (A ⇤B) ⇤ C
After applying the rules of boolean algebra and without the rare event approximation, it can
be reduced to the simple and equivalent form
T = C + A ⇤B
15
Figure 2.1.3: Fault Tree example (Source: adapted from [34])
16
This represents the minimal cut set for the system, which shows a single component failure
and a double failure [16]. This allows an equivalent fault tree to be constructed as seen
in figure 2.1.4. Both trees have the same minimal cut sets and are numerically equivalent,
having the same system level rate of failure.
Figure 2.1.4: Minimum Cut Set Example (Source: adapted from [34])
The minimal cut set is a useful tool for determining if a combination of events will cause
the top level event to occur and for reducing the complexity of a fault tree. It especially
highlights the most undesirable combinations; however, in the presence of a large number of
AND and OR gates the algebra can become complicated [31]. A common problem with this
approach is the multiple e↵ects of certain events, that will not only be seen in the original
fault tree but also in any cut set of that same fault tree. For instance, examine Figure 2.1.5
which represents a communication network presented by Kececioglu [18]. Event C has two
e↵ects, and event A has three. If this were represented in a fault tree multiple events would
have to be presented as multiple nodes in the fault tree.
The minimal cut set approach would be used to simplify any fault tree that was presented,
17
Figure 2.1.5: Communication network (Source: adapted from [18])
with the cut sets being: {A}; {G}; {E, F}; {B, C, F}; {B, C, D}; {C, D, E}. A fault tree
built from these cut sets clearly has multiple instances of the same nodes and overcomplicates
a relatively simple network, as seen in Figure 2.1.6. These multiple instances would create
identical causal nodes that would occur at di↵erent locations in the fault tree, and these nodes
are seen as events with a common cause of failure and having multiple e↵ects. This problem
occurs because the AND/OR gates in the Fault Tree only allow series or parallel interactions
of components. The multiple instances of these nodes are identical and dependent on one
another, not seperate events. Kececioglu introduces Mirror Blocks to handle this event [18].
These Mirror Blocks are shown in Figure 2.1.6 with a small black box next to them as
designation. This cosmetic fix does little to actually simplify the design of the fault tree,
and arguably by adding more nomenclature, it further complicates it.
Fault Trees have some deficiencies but are very easily applied to a large number of systems.
They have been widely adopted by industry as a way to identify and reduce causes of failure
and to graphically depict components in a system that may be otherwise hard to visualize.
18
Figure 2.1.6: Communication Network Fault Tree, Denote Mirror Blocks (Source: adaptedfrom ([18])
2.2 Event Tree Analysis
Another method used for system safety analysis involves an Event Tree. Analysis of an
event tree is often used to determine successful operation of components that depend on a
chronological set of events [8]. An Event Tree is an inductive method that captures the e↵ects
of an initiating event and works well at showing the progression of sequential events and their
possible outcomes. One event will cascade its e↵ects to the next event and then its e↵ects
to subsidiary events in line. An initiating event typically has a success and failure condition,
with their probabilities summing to one. A success condition for an event is defined as the
occurrence of that event. This system allows the probabilities to be cascaded down from the
initiating event to the failure events. The main di↵erentiator between an Event Tree and a
Fault Tree is that Event Trees investigate both success and failure scenarios while a Fault
Tree does not. The structure of an event tree is similar to that of a probability tree and can
19
be seen in Figure 2.2.1.
Figure 2.2.1: Event Tree Example (Source: adapted from [8])
Each chance node is associated with a probability of operation given the previous event.
Bilal [8] introduces an example of a pump operated to extinguish fires. Suppose that the
probability of the pump operating correctly (PO) is given as 90% and the inverse (PO*) is
10%. Given that the pump is functioning (FE) the probability of the fire being extinguished
is 40%. If a fire occurs the probability that the fire is extinguished can be represented as the
product of these nodes.
PPS = PPO ⇤ PFE = 0.90 ⇤ 0.40 = 0.36
Therefore the probability that the fire is extinguished is 36%. This provides a very quick
method for investigating simple systems and analyzing their sequence of events and how
possible hazards are a↵ected by initiating events. Such trees can be created with di↵erent
initiators to show di↵erent outcomes or consequences [8].
20
Event trees are widely used in many fields and their popularity is due in part to their
simplicity and the holistic picture of risk and reward association [18]. As a system grows in
components so do the events, which can cause Event Trees to become confusing.
2.3 Failure Mode and E↵ects Analysis
Another common tool to evaluate a system’s reliability and failure modes is Failure Mode
and E↵ects Analysis (FMEA). This is a deductive method of assessing a system’s reliability
as a whole to determine what the e↵ect of each failure mode is and how damaging it can be.
The process for performing a FMEA was introduced by the Department of Defense in the
1970s in Mil-STD-1629A [29]. The FMEA has become an integral approach to analyzing
system reliability; however, it tends to focus on the much more severe failure modes [8].
These potential failures are ranked in terms of importance and ability for a corrective action
to be implemented [18]. Failure modes in this context usually refer to a specific event or
state that occurs that will have damaging e↵ects to the system and in this process the worst
possible e↵ects are considered. The failure e↵ects are those negative events that occur when
the failure mode occurs. If these failure modes are able to be identified, failure detection
methods and corrective actions are desired to be put in place. The FMEA process provides
a documentation method for including all of these steps in the form of a work sheet as
exemplified in Table 2.3.1 which is an example of the Mil-STD-1629A [29] format.
An assumption in the FMEA process is that a failure mode may or may not be the root cause
of a higher level failure, root cause being the base failure event. This implies that the higher
level failure does not necessarily have to occur if the root cause occurs. This assumption is
a departure from some other failure analysis methods, which imply that if a failure mode
21
ID
Number
Item Function Failure
Mode
and Cause
Mission
Phase
Failure E↵ects
Example Engine Propulsion Engine Fire All Fire condition,
damage to
components,
loss of
propulsion
Failure
Detection
Method
Compensating
Provision
Severity
Class
Remarks
Fire
detection
system
Fire
Suppression
system
1
Table 2.3.1: Example FMEA
occurs, its higher level contributors also have to occur [8]. The opposite can also be stated,
that a failure e↵ect does not have to occur from the root cause identified, but could have a
separate root cause that still needs to be identified. All root causes of higher level e↵ects
may not be identified, however the most likely causes are known and listed in the FMEA
worksheet.
The FMEA process provides a methodology to identify failure modes and a process for
applying corrective action while examining the larger system e↵ects [18]. The process is very
broad and has been adopted in many di↵erent ways to improve product quality and safety.
2.4 Failure Mode, E↵ects, and Criticality Analysis
A Failure Mode, E↵ects, and Criticality Analysis (FMECA) is a more in-depth model of
a FMEA in which some form of evaluation is performed on the risk associated with the
22
potential problems that have been identified. Two common methods are the Risk Priority
Number (RPN) and a Criticality Analysis, both adding what can be seen as a risk level
for potential failures. The RPN is a product of three elements, the severity, likelihood of
occurrence, and likelihood of detection. This can then be used to compare failure events and
prioritize them to be mitigated.
The Criticality Analysis is very similar to an RPN analysis, outlined in MIL-STD-1629A,
Section 102 [29], and describes both a quantitative and qualitative method for performing
analysis. The quantitative approach of a criticality analysis requires that for each failure
mode three things must be determined, (1) the probability of failure of the components �P ,
(2) the probability that the failure mode being investigated is actually the cause of mission
failure ↵, and (3) the probability that this failure mode will result in mission failure �,
commonly known as severity. The criticality for that specific failure mode is denoted as Cm
and is given as
Cm = �↵�P
The severity classification and probability levels are typically defined as categories from
either verbal or numerical charts. Many of the category classifications used for the U.S
Federal Aviation Administration, NASA, or the U.S. Military are derived from MIL-STD-
882 [30, 29, 9].
The qualitative approach to evaluate risk involves both rating the severity of potential e↵ects
of a failure, and the likelihood of occurrence for each potential failure mode. These two factors
are then compared in a risk matrix with severity on the horizontal axis and likelihood on
the vertical axis as can be seen in Figure 2.4.1. This approach is much less involved and it is
common to see it with slight variations. This approach is recommended for risk assessment
23
in the NAVAIR Risk Assessment Handbook [9].
Figure 2.4.1: Risk Matrix
Once the criticality analysis is performed, columns for the likelihood, severity and criticality
number are added to the FMEA table seen in Table 2.3.1 to generate a FMECA. The
FMECA process can be easily tailored to specific applications, making it a popular approach
for risk analysis. It is also relatively simple to implement and can be as simple as an Excel
spreadsheet [18].
2.5 Bayesian Belief Networks
Bayesian Belief Networks (BBNs) are a form of probabilistic modeling that represent a
system as a series of random variables and the dependencies among them [8]. BBNs are
related to Fault Trees in that both are directed acyclic graphs. Many methods cannot handle
complex dependencies or uncertainty. Jenson explains how BBNs excel when expert opinion
24
is ambiguous, incomplete or uncertain [17] . The prior methods discussed in this chapter
were either deductive or inductive; a BBN is abductive. Abductive reasoning can be seen as
the “inference to the best explanation” [2]. The previous methods were also qualitative or
quantitative while a BBN can be either [7]. A qualitative approach to BBNs will construct
a Causal Network to graphically represent variables and the relationships between them. A
BBN as a quantitative method will use a form of probability calculus, Bayesian calculus, to
represent variables and their interconnections.
As mentioned briefly above, Bayesian Belief Networks are graphical structures that use
probabilistic reasoning to ascertain information about the unknown. BBN variables are
often called nodes and the relationships that connect them are termed arcs. An arc between
two nodes implies those two are conditionally dependent on one another, while the absence
of that arc implies conditional independence [8]. This definition indicates that if two nodes,
from A to B, are connected by an arc, A is seen to cause B and this informs us that they
are conditionally dependent. There is a similar interpretation if B influences C. However, if
there is no arc connecting two nodes, as is the case in Figure 2.5.1, A and C are conditionally
independent. If the variables A and C are conditionally independent given B, as in Equation
(2.5.1), this means that if B is known there is no knowledge of C that will alter the probability
of [17]. The term P (A|B) represents the probability of A given that B is known to have
occurred. If evidence of B is given, B is said to be instantiated. The directed nature of
the models attaches certain deterministic information to the model that will update the
probabilities as information is elicited [26].
P (A|B) = P (A|B,C) (2.5.1)
25
Figure 2.5.1: Simple Bayesian Net
The same interpretation of conditional independence still applies between the variables A
and C given B in a serial connection as seen in Figure 2.5.2. These links signify the direct or
causal dependence and the influence between variables, for instance PA ! PB; where PAis
the probability of A occurring. When there is a direct link from one variable to the next,
Pi ! Pj. Pi is called a parent, and Pj is the child of Pi. These connections seen in Figures
2.5.1 and 2.5.2 create influence diagrams.
Figure 2.5.2: Serial Bayesian Net
2.5.1 Basic BBN Methods
The causal network is represented graphically as an influence diagram, with a series of
connected variables. How those variables are connected can vary depending on the needs
of the model. Influence diagrams can have four basic structures: diverging, converging,
26
serial, or a hybrid of these. An example of these can be seen in Figure 2.5.3. A diverging
connection, also seen in Figure 2.5.1, shows the parent directly influencing both children
nodes. If A is known, or instantiated, both B and C cannot e↵ect one another due to their
independence. The same can be said of a serial connection; if B is instantiated, neither A
nor C can influence one another. In a converging connection, as seen in Figure 2.5.3, the
parents are initially independent of the child. If C is instantiated then the parents become
dependent upon C.
Figure 2.5.3: Influence Diagram Structures
A BBN allows for non boolean events to occur, which allows a simple method to account
for unknowns and deal with numerical uncertainty. Bayesian methods provide a way of
reasoning about partial beliefs under conditions of uncertainty [32]. A random variable,
designated in a BBN as a node, can be discrete, continuous, or mixed. The states of these
nodes must be mutually exclusive in order for uncertainty to be determined [7]. When the
nodes of a BBN are defined as a discrete random variable and have a finite set of possible
values, the BBN is called discrete.
In probability calculus, a conditional probability is used to achieve updated probabilities
when events are instantiated. For instance, “the probability of event A given B is x” or
27
P (A–B), which can in turn be written mathematically as P (A|B) = x [7]. If B is determin-
istic then this will become P (A|B) = P (A), which shows us that A and B are independent.
If P (A|B) = P (A|B,C), then A and B are said to be conditionally independent given C.
This is introduced graphically in Figure 2.5.1 [32]. This can be further expressed in Equation
(2.5.2), where P (A,B) denotes P(A) conditionally independent of P(B) when P(B) 6= 0
P (A|B) =P (A,B)
P (B)(2.5.2)
Bayes theorem is derived from this definition of conditional probability and is seen in Equa-
tion (2.5.3). P(A) and P(B) are independent probabilities, and P (A|B) is the probability of
A when B is instantiated; and P (B|A) is the probability of B given that A is instantiated
[32]. P(A) is known as a prior probability and P (A|B) is known as a posterior probability.
P (A|B) =P (B|A)P (A)
P (B)(2.5.3)
Unlike in a fault tree, a variable can have more than two states. So if variable B has
numerous and mutually exclusive states b1, b2...., bm, then P (A|B) can be represented by
an n ⇥ m matrix of entries P (ai|bj) where the columns sum to one,Pn
i=1 P (ai|bj) = 1 for
j = 1, ....m [7]. This matrix is termed the conditional probability distribution (CPD) of
variable A. The CPD matrix produces entries of P (ai|bj)P (bj) = P (ai, bj), where P (bj) is
the probability of B being in the state bj [7].
It is helpful to examine this graphically with a simple two node system, that can exemplify
nodal interactions on a larger scale. A two node example is introduced in Figure 2.5.4.
28
Figure 2.5.4: Two Node Example
Node B represents a motor’s operational state and node A a thermal state [21]. There is
a 75% likelihood that it is running at peak e�ciency so a 25% chance that it is not. If
the motor is running e�ciently there is a 30% likelihood that it is overheating; if it is not
running e�ciently there is an 80% likelihood it has overheated. In this example the prior
probability that the motor is running at peak e�ciency is 75%. However, if it is observed that
the motor has overheated, we can update this via the posterior probability. A Conditional
Probability Table (CPT), as seen in Table 2.5.1, shows the likelihood of each scenario. The
joint probability of each condition is simply the product of the corresponding CPT with
each parent node’s probability. A CPT is a representation of the conditional probability
distribution when the variables represented are discrete [26].
Peak E�ciency E�cient Not E�cient
Overheat 0.3 0.8Not Overheat 0.7 0.2
Table 2.5.1: Conditional Probability Table for Motor
A graphical way of visualizing these scenarios is a probability tree, as seen in Figure 2.5.5.
The posterior probability is the probability of B given it is known that A has occurred. This
allows the model probability to be updated. In the case of this example, if it is known that
the motor has overheated, there is a 53% likelihood that it is running e�ciently.
P (RunningE�ciently—Overheated) =(0.75 ⇤ 0.3)
((0.75 ⇤ 0.3) + 0.2)= 0.53
29
Figure 2.5.5: Probability Tree
The above example is only for a simple two node serial interaction and the four basic nodal
interactions that were introduced earlier can be expanded upon to show the dependencies
between parents and children [8]. Figure 2.5.6 represents this extension and shows the
equations that are necessary to build the conditional probabilities for each type of structure.
To expand from a simple system, Murphy introduces a four node system [26]. Each of the
four nodes in this system being either True or False and denoted by a T or F. The Conditional
Probability Distribution is shown as a CPT for each node, listing the probability that each
child node will take each of its di↵erent values and combination of those values of its parents.
Figure 2.5.7 shows an example of why the grass is wet. The event of the grass being wet can
either be True or False, as the grass will either be wet or it will not. Two events could cause
the grass to be wet, as the sprinkler can be on (S=True) or it is raining (R=True), or it can
30
Figure 2.5.6: Conditional Probability of Influence Structures (Source: Adapted from [8])
also be both. It is clear from the CPT for wet grass that if either the sprinkler is on or it is
raining then the grass will likely be wet.
If nothing is known about this system it is simply straightforward to go through the calcu-
lations introduced earlier in this section to obtain a likelihood that the grass will be wet.
With no evidence given, the likelihood of the grass being wet is 72.9%. If it is observed
that the grass is wet this will update the posterior probabilities of the rest. With the grass
being known to be wet, the likelihood of it being cloudy increases to 53.3% and there is a
63.4% chance that it has rained, while the sprinkler on has only a 51.6% likelihood. If it
can also be observed that the sky is not cloudy yet the grass is also observed to be wet, we
can update the posterior probabilities further and say that there is a 86.7% likelihood that
the sprinkler was on. The act of observing data to eliminate other data points is a method
known as Explaining Away [26].
31
Figure 2.5.7: Wet Grass Example (Source: adapted from [26])
32
2.5.2 Software and Algorithm
For simple systems, it is convenient to analytically expand the mathematical equations and
nodal interactions. As a system becomes larger and more complex it is advantageous to
leverage computing power to populate the probabilities of the BBN. The Hugin Expert BBN
software uses a graphical user interface to show nodal interactions, one that will be used
later in this thesis. Hugin Expert BBN software allows the construction of these BBNs to
combine data and subject matter expert knowledge [25]. The tool also allows for parameter
estimation and analysis from generated BBNs. The networks are constructed as probabilistic
models and influence diagrams which were discussed in the previous section.
The Hugin software uses an embedded algorithm developed by Lauritzen and Spiegelhalter
[19]. This algorithm provides local computations with probabilities on graphical structures
and allows their application to custom systems [19]. An expert system is a computer pro-
gram intended to make reasoned judgments about a complex system with minimal outside
judgment [19]. The algorithm provides an e�cient method of exact probability inference
in an arbitrary BBN [24]. The Lauritzen-Spiegelhalter algorithm works in two steps. First
by creating a tree of cliques from the original Bayes network, a clique being an undirected
graph subset with every two of its nodes connected by an arc. Then the probabilities for the
cliques during a message propagation and the individual node probabilities are calculated
from these probabilities of cliques [24].
2.5.3 Object-Oriented Bayesian Networks
Bayesian networks do well at depicting complex relationships between events by simplifying
their representation. However, as systems become larger with an increasing number of arcs
33
and nodes it is desired to often reduce the complexity of the system without losing depth of
information. A common technique used in programming is to create Classes or Objects that
encompass a set of attributes to that object. BBNs can capitalize on a similar technique
using subnets to capture a set of logic gates and variables into a simple representation of
an object that outputs to a higher level network. Bayesian networks utilize this approach
as introduced by Koller [11] called Object-Oriented Bayesian Networks (OOBNs). These
OOBNs allow a framework for large and complex systems to be represented clearly and
e↵ectively.
The basic element of an OOBN is the object, with the most basic object being a standard
random variable [11]. An OOBN requires certain nodes to be represented as Inputs, which
would come from other models, or subnets, that have outputs. An object in an OOBN can
represent many variables and can be seen as its own Bayesian network. For example a fuel
injection system could be represented as a Bayesian network and then later be pulled into
a larger OOBN that represents the engine. The engine network and fuel injection network
would both be their own networks, and would be linked together by the input and output. A
graphic of this OOBN can be seen in Figure 2.5.8. Expanding on the fuel injection system,
an engine may have more than just one fuel injector, and the output of that system may be
led to multiple inputs on the engine OOBN. This approach organizes the information into
layers which may be visualized more easily.
2.5.4 Sensitivity Analysis
An advantage of BBNs is the propagation of evidence outlined in the Lauritzen-Spiegelhalter
algorithm [19]. This allows the following question to be answered about a given BBN: How
sensitive is the outcome to variations of the node? This question can be answered to better
34
Figure 2.5.8: OOBN engine example as constructed in Hugin
determine the e↵ects of a given parameter on other events of the network or even a top level
event. Many of these parameters are imprecisely specified in a model and can be a possible
source of error. The most influential parameters should be identified and e↵ort should be
directed towards reducing their e↵ects. The process of identifying these parameters and
analyzing their negative e↵ects on other probabilities in a model is known as a sensitivity
analysis [15].
For example, consider a two node section of a BBN, between a parent node B, and a discrete
child node A. To investigate using sensitivity analysis, let a be the state of A. With evidence
", the probability P (A = a|") can be seen as a function of the conditional probabilities in a
CPT of the two nodes. Let B be a discrete node in the network and subject to a conditional
parameter x, which can be seen as an input parameter. B is also in state b and subject to
an input. From van der Gaag [20], the probability of evidence " is a linear function of x,
P (")(x) = �x+ �. Therefore the joint probability of the evidence and the event “A = a” is
also a linear function of x, P (A = a, ")(x) = ↵x + �. The sensitivity function is defined as
follows:
35
P (A = a|")(x) = ↵x+ �
�x+ �
(2.5.4)
The parameters of this function (↵, �, �, �) are determined from assessment of the parame-
ters that are not varied. These constants can be feasibly determined from the network by
computing the probability of interest for a small number of values for the parameter under
study and solving the resulting system of equations [20].
2.6 Summary and Comparison of Methods
There are many di↵erent methods to determine the risk of a system and perform risk analysis.
Only a few of the more popular methods were discussed in this chapter. Table 2.6.1 provides
a brief comparison of these methods.
Clearly identifying failures and their e↵ects involves analyzing the system as a whole. As
systems become larger and more complex it is easier to overlook hazards or how they can
a↵ect a system. It is not only crucial to identify all of the potential hazards of a system, but
also their e↵ects. The goal of a risk analysis study is to provide the best possible picture
or understanding of the potential hazards. It is sometimes ideal to combine methods of risk
analysis in order to provide this picture, as each method has its strengths and weaknesses.
Of the methodologies discussed all provide significant contributions to risk management,
however BBNs are underutilized. They provide a means of dealing with two problems that
commonly occur, uncertainty and complexity [7]. BBNs influence diagrams provide an ob-
jective and compact visible way to show the interactions of e↵ects for decision making in risk
36
Analysis Tool Advantages Limitations
Fault Tree Analysis(FTA)
Easily determinedprobabilitiesEasily determinesundesired events
Limited event interactionsEvents statisticallyindependentAll failure events must beknownboolean failure events
Event Tree Analysis(ETA)
Easily determinedprobabilitiesMultiple results analyzed
implies serial eventinteractionEvents statisticallyindependentAll failure events must beknownBoolean failure events
Bayesian BeliefNetworks (BBN)
Allows for numerous causalinteractionsAbductive modelNumerous event statespropagation of evidence
Requires CPTsComplex nodal interactionsMath can be complexwithout computers
Failure Mode andE↵ects Analysis
(FMEA)
Useful documentation ofsystem e↵ectsExamines every component
Rigorous detailDoes not include multiplefailuresExpansive for large systems
Failure Mode andE↵ects CriticalityAnalysis (FMECA)
Expands FMEAUseful for mitigation e↵ects
More analysis requiredLabor intensive
Table 2.6.1: Risk Analysis Methods
37
management. They also provide a way for updating models when new evidence is introduced,
thus addressing epistemic uncertainty and propagating that evidence.
38
Chapter 3
Bayesian Approach to Risk Analysis
3.1 Current Research using Bayesian Methods
While FTA is an e↵ective tool for predicting system unreliability, it requires knowledge of
component failure rates. Such information is not always available to small UAV designers and
may be prohibitively expensive to acquire. For example, component failures may occur so
rarely as to defy easy experimental quantification. The current disadvantages of the FTA are
being explored and supplemented with a BBN approach to better capture inter-connectivity
of failures and incorrect probabilities.
There has been a persistent need to develop advanced risk analysis tools to move beyond
simply identifying risk factors. As systems grow there is a need for analyzing and interpreting
the complex interactions of various system risk factors. Luxhoj reports on the development
of an Aviation System Risk Model (ASRM) to use the underlying probabilistic methodology
of BBNs and their influence diagrams to graphically portray these complex interactions [22].
39
A strength of the BBN approach is to simplify large systems and illustrate them as OOBNs.
This approach allows a small subsystem to easily be integrated as an object into the larger
system. This object can facilitate a test piece or mission specific scenario without having to
recreate a full system, allowing for a building block style system. An example of this is an
aircraft system where only a specific part of the system is desired to be investigated, such as
a the communications system, without looking at much of the larger network. It could then
be desirable to investigate the communications system for a lost link scenario. Evidence of
the link being lost can be propagated and allow likely causes to be identified, as well as how
those causes will later a↵ect the larger system [23].
The conventional FTA approach only allows for known quantities to be accounted for and
does not allow for inter-connectivity of failures as demonstrated by Murtha [27]. Reason
[33] has been conducting research into the possibility that failure events that were originally
believed to not cause a failure may combine further along with other minor failure events to
cause a top level failure. The eventual combination of failure events was coined the ’Swiss
Cheese Model’ as failure events that should not propagate forward past one event would
eventually find a small probability, or hole in each event, and move to the next event in
the chain as if passing through holes in multiple layers of swiss cheese. An application of
this type of model would be di�cult to be modeled with the conventional FTA approach.
These limitations can be overcome using Bayesian Belief Networks. Others have recognized
the limitations of FTA and have suggested the use of BBNs to address these shortcomings.
Janota [16] shows how the potential of FTA can be capitalized on and expanded using BBNs
without the typical limitations of FTA. A traditional FTA can be taken and transformed
into a BBN enabling a risk analyst to preserve and re-use earlier work based on FTA.
40
3.2 Fault Tree as a Special Case of Bayesian Methods
Fault Tree Analysis is mainly and commonly used in the fields of safety and reliability
engineering to understand how systems can fail at a functional level [16]. It can be applied
to identify the best ways to reduce risk or to determine what event rates are for an accident. It
is commonly used to analyze severe failure conditions where information about catastrophic
failure is desired.
As discussed in Section 2.5.1, a BBN is a type of directed acyclic graph showing influence
between each variable or event. A FT is represented in much the same way, with a few
nuances that place limitations on its implementation. For instance, a FT can only have
converging or serial nodes to represent it. This limitation on the type of nodal interaction
does not exist for a BBN. A BBN expands the types of interactions of a FT with the addition
of diverging and hybrid type nodes. These interactions are seen in Figure 3.2.1, with the
BBN type of nodal interactions represented by all four cases. A BBN can be represented
by both converging and serial nodes and the nodal interactions can be represented using
boolean integers in their CPTs. This method would then emulate an FT.
3.3 Fault Tree Comparison to Bayesian Network
To show a BBN representation of a classic FT, a FT is first introduced. Research was
performed previously at Virginia Tech by Murtha [27], whose work introduced a FT focused
on a small UAS platform, and this example is expanded upon for the BBN research. Murtha
sought to utilize FTA in order to identify the most likely cause of catastrophic failure and
mitigate against it in order to drive down cost and improve operational reliability of the
41
Figure 3.2.1: Bayesian Expansion of Fault Tree .
platform. Figure 3.3.1 represents a simple aircraft system with the top level event being a
system failure. This system is built from only two type of gates, AND and OR gates. The
OR gate consists of: a Main Battery, an Autopilot; while the AND gate is composed of two
Servos. The probabilities for each of these events were drawn from a subject matter expert
and are shown in Table 3.3.1 [27].
Component Probability of Failure
Main Battery 0.05Autopilot 0.001Servo 0.1
Table 3.3.1: Component Probabilities (Source: [27]) .
The total probability of failure may be calculated using Equations (2.1.4) and (2.1.5), and the
rare event approximation is not used in this case. The probabilities of failure are represented
by: PSF for Probability of System Failure, PMB for the Main Battery, PA the Autopilot, and
42
Figure 3.3.1: Fault Tree Example (Source: [27]) .
PS for the Servos.
PSF = 1� (1� PMP ) ⇤ (1� PA) ⇤ (1� PS1 ⇤ PS2) = 0.06044 (3.3.1)
Equation (3.3.1) provides the probability of failure for the given system as 6.04% from the
FT modeling approach.
The same system shown in Figure 3.3.1 can be represented using a BBN with a few small
di↵erences. A collector node is added to the two servos to form a Servo System, and this
acts as the AND gate to simplify the CPTs to two smaller ones, similar to the divorcing
concept allowing one 3-dimensional CPT to be represented as 2 2-dimensional CPTs. This
slightly modified system can be seen in Figure 3.3.2.
The most important di↵erence in the approaches is how the nodal interactions are handled.
While a FT requires boolean logic, a BBN does not; however, a BBN can accommodate
43
Figure 3.3.2: Bayesian Net Example .
Boolean Logic. To achieve the same form of logic, conditional probabilities in the Bayes
formula in Equation (2.5.3) are given as 1 or 0. To illustrate this, a simple two node system
is introduced with node B influencing node A as given in Figure 3.3.3.
Figure 3.3.3: Two node system .
In this example the initiating event is node B, and the conditional probabilities are treated as
boolean. If B is known to occur, then A also has to occur, and is represented as P (A|B) = 1.
If B does not occur, then A can not have occurred. Figure 3.3.4 shows the e↵ects of these
interactions, as well as the lack of sensitivity due to possible variations. Also note that if B
occurs there is no possibility that A does not occur, P (B|A⇤) = 0, where the complement
A* indicates that A cannot occur. These conditional probabilities are used to build the
CPTs that handle more complex nodal interactions. The posterior probabilities indicate the
symbiotic relationship as well, that if A is known to occur, B is also known to occur.
44
Figure 3.3.4: Probability Tree of Example .
These nodal interactions can then be expanded to handle the larger system as seen in Figure
3.3.2 and to build the corresponding CPTs. The conditional probabilities are shown as the
boolean operators columns for each scenario of failure event.
Figure 3.3.5: CPTs of BBN example .
The probabilities can then be propagated to give the overall probability of system failure as
0.06044 or 6.044%. This answer is identical to the answer from the FT approach in Equation
(3.3.1), showing the equivalence of the approach. While the FT approach uses the gates to
collect the probabilities, the CPTs perform a very similar role in the BBN method. The
gates are conceptualized in the CPT. For example, the servo system node can be observed
as an AND node in a FT, and a Servo System Failure only occurs if both Servo1 Fails and
45
Servo2 Fails. Otherwise, it is seen that the system is in a No Servo System Failure condition.
Converting a FT to a BBN is a useful application of Bayesian modeling. However, doing
so only allows the model to be as accurate as it is modeled under the constraints of the
FT. Removing those constraints may enable the model to better represent the system. For
instance, at first glance the system modeled in Figure 3.3.1 seems to model a system well.
Upon close examination, deficiencies can be seen. For example the failure of the main battery
does not directly cause the system to fail. It is in fact a primary failure event of both the
Servo System and the Autopilot system failures. The system can be remodeled as seen in
Figure 3.3.6 to better capture the system interactions, which would not be allowed in a FT
approach without mirror blocks.
Figure 3.3.6: System Remodeled .
This type of systematic modeling error can be seen as a form of epistemic uncertainty in
the model. Epistemic uncertainty is systematic uncertainty do to a discrepancy between
known theory and practical application [3]. In this case by modeling the system in a way
that best captures the interactions and dependencies of events, the epistemic, or modeling,
uncertainty can be reduced.
46
3.4 Identifying Mitigations
Unlike FTs, BBNs do not require the conditional probabilities to be boolean events which
were first introduced in Chapter 2. This relaxes the boolean condition imposed in Section 3.3.
If the two node system in Figure 3.3.3 is re-imagined such that the conditional independence
is not 1, it can be said that if B occurs, then A is only likely to occur with some non-
zero probability. It is instructive to build a probability tree to show this, as seen in Figure
3.4.1. The conditional probabilities are relaxed by 5% to show a near certainty correlation
between the failure events of A given B, P (A|B) = 0.95. This allows for the aforementioned
leak probabilities and possibility that failure of A could be caused by an unknown outside
influence or that B could occur and not trigger event A.
Figure 3.4.1: Non-Boolean Conditional Probabilities .
Section 2.5.4 introduced the sensitivity analysis method that can be performed with non-
boolean conditional probabilities. Returning to the example in Figure 3.3.2, the CPTs can
be varied to provide information about the interdependence of the nodes. First, the CPTs
must be relaxed from boolean operators, for these values are chosen corresponding to subject
47
matter expert input. To further exemplify the utility of BBNs, a third condition is added to
the battery node to indicate a degraded battery condition. The overall probability of failure
for the system with the modifications is 44.76%. The sensitivity function from Equation
(2.5.4) is used to determine the sensitivity to the varied conditional probability values. For
this example the values were varied by +/- 10% and +/- 20%. The results of this variation
are shown in Table 3.4.1. The results are relatively close between the Battery Failure and
Autopilot Failure, however Battery Failure is more sensitive to variation.
Hypothesis Variable Parameter Sensitivity 0% 10% -10% 20% -20%
System Fail Servo 1 0.04 0.44 0.44 0.436 0.448 0.432System Fail Servo 2 0.02 0.44 0.442 0.438 0.444 0.436System Fail Battery Failure 0.41 0.43 0.471 0.459 0.512 0.448System Fail Battery Degraded 0.2 0.42 0.441 0.4 0.46 0.38System Fail Autopilot Failure 0.11 0.45 0.461 0.439 0.472 0.428
Table 3.4.1: Sensitivity to Parameter Variation .
This sensitivity can be shown in an alternative way to further illustrate this important
concept. Evidence can be introduced of certain failure events and the overall system failure
percent increase relative to the baseline can be examined. This is known as a Likelihood
Multiplier and is represented as Equation (3.4.1) [24].
LikelihoodMultiplier (LM) =P (withCausal Factor evidence)
P (withoutCausal Factor evidence)(3.4.1)
Multiple failure events can be examined and their results tabulated. The LMs are shown in
Table 3.4.2. These results clearly show which failure event has the highest e↵ect on overall
system failure. Clearly a system failure is much more sensitive to battery failure and even a
degraded battery condition.
These values from Table 3.4.2 can be shown graphically in Figure 3.4.2 to further illustrate
48
Failure 1 Failure 2 Risk(%) Likelihood % Increase Rank
None None 44.59 1 0Autopilot Failure None 55.58 1.246 19.773Battery Failure None 83.58 1.874 46.649 3Servo1 Failure None 47.93 1.075 6.968Servo1 Failure Servo 2 Failure 48.98 1.098 8.968Battery Failure Autopilot Failure 94.36 2.116 52.745 1Battery Failure Servo 2 Failure 87.14 1.954 48.829 2Servo 2 Failure Battery Failure 59.19 1.327 24.666
Battery Degraded None 61.27 1.374 27.224Battery Degraded Servo 1 Failure 64.76 1.452 31.146
Table 3.4.2: Likelihood Multipliers
this example. Clearly a battery failure condition is an event that needs to be mitigated.
Figure 3.4.2: Graph of Likelihood Multipliers (LMs) .
49
3.5 Criticality Analysis
The methods introduced in Section 3.4 to identify areas that may need mitigation in a
system can also be seen as a form of quantitative Criticality Analysis. Common criticality
techniques require identifying both a severity of the failure event as well as its likelihood of
occurrence. If the most susceptible component of a system can be identified, the criticality
numbers could be reduced. To reduce the likelihood of occurrence of an undesirable event,
a more reliable part could replace the current one or a redundant component could be used.
For the system in Figure 3.3.2, this will not have an e↵ect on the severity of the occurrence,
just the likelihood. The system is still dependent on the power provided by the battery to
operate fully. A system with an additional battery can be seen in Figure 3.5.1 with a battery
system collector node.
Figure 3.5.1: Bayesian Example with Mitigation
The reduction in system risk from 5% to 39.72%. The relative risk between the mitigated
and unmitigated conditions can be computed as a risk ratio in Formula 3.5.1, as shown by
Luxhoj [24].
LogicRisk Ratio =P (mitigated)
P (unmitigated)=
39.72
44.76= 0.8873 (3.5.1)
This Logic Risk Ratio means that the mitigation of an additional battery reduces the likeli-
50
hood of its occurrence by 22%.
3.6 FMECA representation
A FMECA is commonly used to represent failures and their higher level e↵ects, as well as the
criticality analysis of such failures. A more in-depth explanation of this method can be found
in Section 2.4. The same process used to develop a FMECA is also useful when developing
a BBN. It is necessary to identify the failure events and what failure modes will occur as a
result, as well as how those failures a↵ect the higher level systems. With this commonality
between the two approaches and the need of the FMECA for a criticality analysis it is
advantageous to use both in parallel. An example FMECA can be seen in Figure 3.6.2, built
from the working example in this chapter to show the e↵ects of mitigations. To ascertain
the criticality number a risk matrix was adapted and slightly modified from the NAVAIR
risk matrix [9] in Figure 3.6.1.
Figure 3.6.1: Modified NAVAIR Risk Matrix .
On the Risk Matrix, risks denoted in the red region are obviously events that are highly
undesirable and need to be mitigated if undesirable events are to be avoided. This is exem-
plified in the FMECA, where the unmitigated failures of a single servo, or single battery are
ranked below a value of 5. These failures are then mitigated with a redundant component
to bring the probability of occurrence down, increasing the risk number to a more desirable
51
level. Using the BBN to remodel and recalculate the probability of occurrence can reduce
the complexity of having to recalculate these probabilities for each component.
Figure 3.6.2: Example FMECA .
3.7 Summary
BBNs can address some of the deficiencies su↵ered by FTA. A BBN is shown to represent a
FT, as a special case, with a numerically equivalent system risk. By applying a sensitivity
analysis to the BBN, areas that may require mitigation can be identified and then a miti-
gation applied. This fits directly into the operation of a FMECA and can supplement its
advantages as well. The approach discussed above could further be expanded by attaching
52
a utility and cost to the mitigation. A cost benefit analysis can then be performed to show
how the system can reduce risk for the amount spent.
53
Chapter 4
ESPAARO Case Example
Methods of Risk Analysis theoretically work well for the applications that they are presented
for. However the end goal of a method is to be applied to actual systems and to reduce the
risk of that system in practice. The method outlined in Chapter 3 was therefore applied
to an internally funded UAS design at Virginia Tech, called the Electric Small Platform for
Autonomous Aerial Research and Observation (ESPAARO). This aircraft is an iteration of
a previous platform and is built as a research test bed to accommodate a wide range of
modifications. For instance, the fuselage support hatches can be modified or replaced as well
as a wing that can be detached for a modified wing to replace it. Above all this aircraft
was designed with a low price point and gentle flight characteristics to facilitate flight at the
Kentland Experimental Aerial Systems (KEAS) Lab, show in Figure 4.0.1.
54
Figure 4.0.1: KEAS Lab Runway
4.1 Introduction to the ESPAARO
The ESPAARO is designed with modifiability as a key design objective, and for this rea-
son each part of the air frame can be detached without greatly a↵ecting the other parts.
Everything is attached to a carbon fiber strut assembly that acts as the keystone to hold
the airframe together. This allows the tails to be replaced, or the wing, or a fuselage to
be switched out. A dimensioned drawing of this aircraft is displayed in Figure 4.1.1. This
aircraft has a wingspan of 12 feet and a maximum takeo↵ weight of 45lbs, which allows large
experimental payloads to be flown up to 10 lbs. An example of the value of this modularity is
demonstrated with a morphing actuated wing on an unmodified ESPAARO [14]. A drawing
showing the strut assembly is shown in Figure 4.1.2.
The aircraft was originally prototyped out of a fiberglass wrapped foam fuselage. As the
design was refined, a mold was manufactured by Nextgen Aeronautics to facilitate a hollow
composite sandwich fuselage that greatly increased the usable payload volume and decreased
the fuselage weight. The design of the airframes allows the fuselage to be removed from
the wings and tails to allow a di↵erent fuselage to replace it. The design of the appendages
55
Figure 4.1.1: ESPAARO Drawing
Figure 4.1.2: ESPAARO Strut Assembly
56
attaching to this are desired to be simple and only consist of one control surface and actuator
per wing section or tail. To facilitate low speed handling performance, the control surfaces
were over-sized and the controller had exponential gains added to prevent pilot induced
oscillation.
Another design driver of the project is to allow custom control algorithms to be developed
and tested on the aircraft. For this reason the 3D Robotics Pixhawk autopilot was chosen,
allowing o↵ the shelf usability but also an open source Real Time Operating System for
custom modification. This system allows for numerous control modes, primarily a manual
pass through of control, an augmented stabilized mode, or a fully autonomous flight mode.
In order to generate custom control algorithms it is often desirable to identify a model for
the system that is being controlled. The autopilot has numerous sensors on board in order
to facilitate this but also allows the integration of additional sensors into the system. For
flight testing an air data boom was attached to log angle of attack and sideslip information.
The Nonlinear Systems Lab maintains 3 ESPAAROs, the finished product can be seen on
the runway at KEAS in Figure 4.1.3. The Mid Atlantic Aviation Partnership also maintains
two.
4.2 Model Construction and Probability Elicitation
For a risk model to be constructed of the ESPAARO, a panel of three Subject Matter
Experts (SMEs) was polled to identify potential failures and their e↵ects. Each SME was
very familiar with the aircraft, its construction, and its operation. For the purpose of this
risk model, it was agreed upon to identity failures that would result in a mission failure, the
57
Figure 4.1.3: ESPAARO on the runway
inability of the aircraft to perform its given operational duties. This could range anywhere
from a lost data link to a loose wire. These failures will not necessarily result in the loss of
the aircraft; however, they could. The emphasis is on operational reliability of the aircraft.
It was also agreed upon to focus on the probability of failure for these events for any given
flight, and not per flight hour.
Potential failure events were identified with the SMEs that would prevent the operation
of the ESPAARO platform. Forty three base failures were identified with multiple e↵ects.
These failure events have probabilities that were elicited from the SMEs and can be seen
in Figure 4.2. To account for the di↵erence in the SME’s probabilities the values were
averaged. Another method is to use evidence theory as discussed by Murtha[27]. If values
ranged beyond 10% a discussion was held to better analyze the failure event. On occasion
this would result in a rework of the network structure, with a finer detail of failure events,
such as the case with the control surfaces.
58
Probability of Failure (per flight)
Parameter SME 1 SME 2 SME 3 Average
Turbulence/Wind Induced 0.0001 0.0001 0.0001 0.0001
Wing Strut/Fuse Attachment 0.001 0.001 0.001 0.001
Wing Spar 0.0001 0.0001 0.0001 0.0001
Sheet Puncture 0.1 0.01 0.01 0.04
Attachment Failure 0.001 0.001 0.001 0.001
Control Flutter 0.001 0.001 0.001 0.001
Control Linkage 0.15 0.1 0.1 0.12
Control Surface Binding 0.001 0.01 0.01 0.007
Servo 0.01 0.001 0.01 0.007
Servo Wire 0.1 0.1 0.1 0.1
Radio Blanketing 0.2 0.01 0.1 0.10
Radio Interference 0.1 0.2 0.2 0.17
Radio Distance Induced Lack of Control 0.01 0.01 0.01 0.01
Improper Wiring 0.1 0.05 0.05 0.07
Autopilot Failure 0.2 0.25 0.2 0.22
CG Shift 0.01 0.01 0.01 0.01
Battery Failure 0.01 0.01 0.01 0.01
Pilot Error 0.25 0.25 0.25 0.25
Shaft Slippage 0.1 0.1 0.1 0.1
Wire Disconnect 0.001 0.001 0.001 0.001
Main Battery Failure 0.02 0.01 0.01 0.013
Voltage Ripple to Motor 0.8 0.5 0.5 0.6
59
ESC Unarm 0.025 0.05 0.001 0.025
FOD through Propeller 0.05 0.05 0.05 0.05
Crack Propagation 0.01 0.001 0.001 0.004
Prop Hub Bolt 0.05 0.1 0.05 0.067
Fuse Attachment 0.0002 0.0001 0.0001 0.00013
Propulsion Attachment 0.005 0.001 0.01 0.0053
Landing Gear 0.1 0.1 0.1 0.1
Hatch Attachment 0.01 0.005 0.005 0.0067
Fatigue 0.001 0.001 0.001 0.001
Acute Damage 0.1 0.01 0.05 0.053
Programming Error 0.05 0.05 0.05 0.05
Improper Set Up 0.2 0.2 0.2 0.2
Improper Gains 0.1 0.05 0.1 0.083
Pitot Static Error 0.05 0.05 0.05 0.05
AHRS Calibration Error 0.05 0.05 0.05 0.05
GPS Error 0.01 0.01 0.01 0.01
Radio Failure 0.4 0.3 0.4 0.37
Foreign Control Input 0.01 0.01 0.01 0.01
Battery Failure 0.01 0.01 0.01 0.01
Computer Lock Up 0.001 0.0001 0.001 0.0007
Figure 4.2 Failure Probability Data
The failure data for each SME was relatively close in agreement and was then averaged
to obtain an ’aggregate’ failure probability. To build the BBN model of this datum each
60
event’s e↵ects had to also be identified. The process to build a BBN is similar to a FMECA
process as each root e↵ect will a↵ect a higher level failure mode. While the FMECA builds
a table, the BBN will build a directed acyclic graph. In this process each failure is seen
to have e↵ects that will propagate in some way to a higher level e↵ect of a mission failure.
These links are built in the BBN model using the CPTs discussed in Section 2.5. For each
CPT, the failure events that build it were ranked from the highest severity to lowest by
the SMEs. The highest and lowest conditional probabilities of failure were then bracketed
by the SMEs. This bracketing was done by determining the probability of all failures in a
given CPT occurring as well as no failures. The remaining probabilities in the CPTs were
populated using a tool developed by a team from the Naval Postgraduate School [10]. The
CPTs for this model can be found in the Appendix.
4.3 Conversion to an OOBN
With each failure parameter identified, a model to represent them must be constructed. This
modeling process was also iterated with the SMEs and proved to be a useful conversation
to discuss the higher level e↵ects of each failure event. As the model was constructed the
graphical representation provided a useful method to deductively identify further failure
events. Figure 4.3.1 shows a model of a controls failure sub-net. The subnet indicates that if
certain failure events occur a control failure will be more likely to occur. For instance, if the
radio link is severed between the aircraft and the pilot, it will lead to a manual control loss
and then to a controls failure. However, other events may occur that could cause a manual
control loss or multiple failures could occur in conjunction to increase the likelihood of a
control loss. This is worth reiterating as a di↵erence from a FT method, where one event
occurs to cause a failure or multiple events combine to cause a failure. One failure event
61
may cause a system failure but it may also not result in a system failure. If a failure event
event occurs the probability of the system failing increases but is not a certainty. Unrelated
events that were not previously thought to combine together to cause a failure may end up
doing so. For a failure to occur like this is an application of the Reason model [33].
Figure 4.3.1: Control Failure Model
As the model was progressively built it became obvious that the complicated interconnections
and size of the model made it di�cult to display or analyze. The BBN representation of
the ESPAARO is quite large and the interdependences between systems make it di�cult
to follow. The model representation of the ESPAARO can be seen in Figure 5.1.1. This
complexity led to the conversion of the model to an OOBN which proved exceedingly useful
in not only representation, but also for investigating the subsystems individually. To simplify
the representation of the model it is converted to an OOBN. The control system can be seen
in Figure 4.3.1 and the top level aircraft in Figure 4.3.2. By creating sub-nets it is easily
possible to investigate each component separately from the full Bayesian Network, which also
allows the simplified view of the full model without loss of granularity into possible causes
of failure.
62
Figure 4.3.2: ESPAARO Aircraft OOBN
4.4 Identifying Mitigations
Sensitivity was first introduced in Section 2.5.4 and is used to identify the most sensitive
failure modes from an abbreviated list of in Table 4.4.1. Theses values were varied by +/-
10% and +/- 20% using the Sensitivity Equation to produce the respective sensitivity values.
These values identify which events will have the greatest e↵ect on the overall system risk,
and correspondingly merit the most mitigation. From this analysis it can be see that a CG
shift, wire disconnect, or main battery failure are the most sensitive to variation.
To further illustrate the e↵ects of sensitivity on the system reliability we can utilize the
likelihood multiplier that was introduced in Section 3.4.1. This compares the base system
risk to that when a particular failure is observed to have occurred, the ratio between the
two being the likelihood multiplier, which is shown in Table 4.4.2. It can also be observed
63
Failure Events 0% 10% -10% 20% -20%
Main Battery Failure 0.29 0.308 0.272 0.326 0.254Battery failure 0.29 0.307 0.273 0.324 0.256Interference 0.29 0.29 0.29 0.29 0.29
Wire Disconnect 0.29 0.311 0.269 0.332 0.248AHRS Error 0.29 0.301 0.279 0.312 0.268ESC Error 0.29 0.306 0.274 0.322 0.258
Attachment Failure 0.29 0.313 0.267 0.336 0.244CG shift 0.29 0.311 0.269 0.332 0.248
Servo Failure 0.29 0.303 0.277 0.316 0.264
Table 4.4.1: Abbreviated Sensitivity Analysis
that multiple failures can occur and is also shown in Table 4.4.2. The failures identified in
Table 4.4.1 are prevalent, however a servo failure is the most likely to occur. A servo failure
increases the system risk by 47.7% to a total of 59%. Two methods of investigation allow
insight into how di↵erent failure a↵ect the system risk. A Sensitivity Analysis focuses on
how damaging to the system any one particular failure can be while disregarding how likely
it is to occur. By applying a likelihood multiplier, it is possible to investigate further, by
analyzing how likely failures are to occur.
Figure 4.4.1: Likelihood Multiplier Graph
64
Failure 1 Failure 2 SystemRisk
LikelihoodMultiplier
PercentIncrease
None None 29.43 1 0Main Battery Failure None 46.91 1.52 34.25
Battery failure None 46.07 1.49 33.05Interference None 50.59 1.64 39.03
Wire Disconnect None 49.99 1.62 38.30AHRS error None 39.57 1.28 22.06ESC error None 45.4 1.47 32.07
Attachment Failure None 54.08 1.75 42.97CG shift None 50.54 1.63 38.97
Servo Failure None 59.02 1.91 47.74Attachment Failure Interference 61.84 2.00 50.12
CG shift Attachment Failure 68.41 2.21 54.91Interference Servo Failure 66.05 2.14 53.30
Wire Disconnect ESC error 51.5 1.66 40.11AHRS error Attachment Failure 60.96 1.97 49.40ESC error AHRS error 52.1 1.69 40.80
Attachment Failure Wire Disconnect 66.74 2.16 53.79CG shift Servo Failure 71.99 2.33 57.16
Servo Failure AHRS error 65.25 2.11 52.73
Table 4.4.2: Likelihood Multiplier
65
4.5 Application and E↵ects of Mitigations
A sensitivity analysis allows events of a system to be identified that have the most potential
to a↵ect system risk. With these events identified, the most critical of these can be mitigated
against. For the ESPAARO failure events can be mitigated against by adding a redundant
system to initiate in the event of a failure. These redundant systems are added to the overall
ESPAARO BBN as discussed in Section 3.3. Each of these failure events were chosen to be
mitigated against with input from the SME panel for realistic implementation. For instance
it is conceivable to add a second battery to a system.
The results of adding 4 redundant systems are shown in Table 4.5.1. Adding a second radio
control link has the potential to reduce the overall system risk from 29.43 to 22.49 percent.
Applying all four mitigators has the potential of reducing the overall system risk to 19.8
percent. Application of the Logic Risk Ratio shows an overall system risk reduction of 67%
by utilizing these risk mitigations.
Redundant System System Risk
None 29.43Second Radio 22.49
19.8Second Control Battery 29.16
Secondary Servos and Linkages 26.48Second Main Battery 25.58
Table 4.5.1: Risk Mitigators
66
Chapter 5
Conclusions and Future Work
Fault Tree Analysis is widely used to calculate a system’s probability of failure, using boolean
logic in the form of AND and OR gates. These components make up the individual compo-
nents in a system to comprise a system “tree” that allows analysis to be performed. These
systems have been used to investigate risk for more than 50 years and are well suited for use
in critical components. However, they cannot easily capture the complex interactions among
components. Bayesian Networks are beginning to be used to investigate many complex
interactions and failures between components.
Bayesian Belief Networks were shown to allow the formulation of boolean logic similar to
FTA. This comparison allows BBN to be demonstrated as a peer to FTA for what is currently
and commonly utilized in Risk Analysis. Numerical examples demonstrated both methods
generating numerically similar answers when modeled using the same methods. Therefore,
FTA can be shown as a special case of BBNs.
Using BBNs allow for a system to be model as closely to how it functions in actuality
67
without adhering to the modeling techniques imposed with FTA. BBNs allow di↵erent ways
of modeling a system or introducing non-boolean operators. The example used to compare
FTA and BBN had the boolean operators relaxed slightly to allow for sensitivity analysis to
be performed. Non boolean operators allows for the investigation of key components that
would have the highest e↵ect on system level failure. These components and their failure
modes could then be mitigated against.
A practical example of the methods introduced was demonstrated on Virginia Tech’s ES-
PAARO platform. This allowed for a realistic system model to be built using BBNs and
then to be graphically simplified using OOBN’s. As this aircraft is currently flown as an
experimental test aircraft, its reliability is always important. The analysis that was done
as a topic of this thesis was always motivated to improve the reliability of that aircraft.
Proposed mitigations were therefore implemented.
The motivation for this thesis is focused on improving the utilization of currently available
risk analysis techniques and applying them to small UAS. A fundamental problem however
is that failure data is not readily available and information has to be inferred from subject
matter experts or from experimental data. Future work could focus on expanding the testing
on certain base level components being used in this application to provide better failure data.
This base level information would then propagate into a more accurate system model.As the
airframe is utilized more, more information will become known.
As UAS become more common and begin to operate in the NAS more information will be
demanded. Characterizing reliability data of certain common components such as motors,
servos, or landing gear would provide useful feedback information for validating reliability
studies in the future. Such components are widely used and very poorly understood in their
current non-military applications.
68
Attached are a collection of Bayesion Networks that comprise the entire ESPAARO system
as presented in the above works. The Bayesion Network is first presented as a whole BBN
and then broken out to form the subnets comprising the OOBN.
69
5.1 BBN for ESPAARO
Figure 5.1.1: ESPAARO BBN70
5.2 OOBN for ESPAARO
Figure 5.2.1: Autopilot Subnet
Figure 5.2.2: Control Surface Subnet
71
Figure 5.2.3: Propulsion System Subnet
Figure 5.2.4: Control System Subnet
72
Figure 5.2.5: Strut Subnet
Figure 5.2.6: Tail Subnet
73
Figure 5.2.7: Wing Subnet
Figure 5.2.8: Aircraft Subnet
74
Figure 5.2.9: Camera Subnet
5.3 Sensitivity Results
75
Parameter Sensitivity ↵ � � �
Turbulence 0.3 0.3 0.29 0 1Wing Spar 0.22 0.22 0.29 0 1
Sheet Puncture 0.18 0.18 0.29 0 1Control Flutter 0.07 0.07 0.29 0 1
Control Surface Binding 0.07 0.07 0.29 0 1Servo Failure 0.13 0.13 0.29 0 1Servo wire 0.12 0.12 0.29 0 1
Control linkage 0.09 0.09 0.29 0 1Cg sShift 0.21 0.21 0.29 0 1
Wing Failure(complete) 3.46E-08 3.46E-08 0.29 0 1Structural(complete) 3.95E-11 3.95E-11 0.29 0 1Attachment Failure 0.29 0.23 0.29 0 1
Landing Gear 0.15 0.15 0.28 0 1Fatigue 0.18 0.18 0.29 0 1
Acute Damage 0.12 0.12 0.29 0 1Propulsion Attachment 0.18 0.18 0.29 0 1
Hatch Attachment 0.17 0.17 0.29 0 1Pilot Error 0.22 0.22 0.27 0 1
Battery Failure 0.17 0.17 0.29 0 1Improper Wiring 0.14 0.14 0.29 0 1
Interference 0 0 0.29 0 1Voltage Ripple 0.19 0.19 0.26 0 1
Crack Propagation 0.16 0.16 0.29 0 1Main Battery Failure 0.18 0.18 0.29 0 1Wire Disconnect 0.21 0.21 0.29 0 1ESC Unarm 0.16 0.16 0.29 0 1
FOD through Prop 0.14 0.14 0.29 0 1Shaft Slippage 0.22 0.22 0.27 0 1
AHRS Calibration 0.11 0.11 0.29 0 1Pitot Error 0.08 0.08 0.29 0 1
Foreign Control 0.07 0.07 0.29 0 1Computer Lock up 0.18 0.18 0.29 0 1
GPS error 0.09 0.09 0.29 0 1Improper Gains 0.09 0.09 0.29 0 1Improper Setup 0.1 0.1 0.29 0 1
Programming Error 0.12 0.12 0.29 0 1Spar Tail Attach 0.25 0.25 0.29 0 1
Table 5.3.1: Sensitivity Analysis Results for ESPAARO
76
77
Bibliography
[1] Unmanned systems integrated roadmap fy2013-2018 http://www.cs.ubc.ca/ mur-
phyk/bayes/bnintro.html.
[2] Abductive reasoning. http://en.wikipedia.org/wiki/abductivereasoning, April 2015.
[3] Uncertainty quantification. Internet, 2015.
[4] Summary of small unmanned aircraft rule (part 107). Technical report, Federal Aviation
Administration, June 2016.
[5] RTO-NATO 2000, editor. Commercial O↵-the-Shelf Products in Defence Applications
”The Ruthless Pursuit of COTS”. RESEARCH AND TECHNOLOGY ORGANIZA-
TION, April 2000.
[6] Federal Aviation Administration. Safety Management System Manual Version 4.0. Air
Tra�c Organization 2014, 2014.
[7] Denise Marie Andres. Development of a post-consequence model (pcom) for aircraft
accident severity assessment. Master’s thesis, Rutgers State University of New Jersey,
2005.
[8] Bilal M. Ayyub. Risk Anallysis in Engineering and Economics. Chapman & Hall, 2003.
78
[9] William Balderson. NAVAIR RISK ASSESSMENT HANDBOOK. Naval Air Systems
Command, 2002.
[10] Kong Luxhoj McKnight Miller Stevens Tonello Rhoades Brockway, Johnston. Exper-
imental unmanned aerial systems interim flight clearances challenge. Master’s thesis,
Naval Post-Graduate School, 2014.
[11] Avi Pfe↵er Daphne Koller. Object-oriented bayesian networks. In Proceedings of the
Thirteenth Annual Conference on Uncertainty in Artificial Intelligence, 1997.
[12] Chad Moses David King, Allen Bertapelle. Uav failure rate criteria for equivalent level
of safety. In International Helicopter Safety Symposium, 2005.
[13] Richard Denning. Applied R&M Manual for Defence Systems Part C- Techniques.
Ministry of Defence, May 2012.
[14] K. Pyne A. Bialy M. Burns G. Mohan N. Beaty C. MacNeal C. Weit C. Kevorkian C.
A. Woolsey E. B. Doepke, M. Heim and M. Philen. Design and demonstration of a
flexible matrix composite actuated flap in a uav. In ASME 2015 Conference on Smart
Materials, Adaptive Structures and Intelligent Systems (SMASIS), 2015.
[15] HUGIN Expert A/S. HUGIN API Reference Manual Versian 8.1, October 2014.
[16] Ales Janota. Overcoming limitations of fault tree analysis using bayesian belief networks.
2014.
[17] Finn V Jensen. Introduction to Bayesian Networks. 1995.
[18] Kececioglu. Reliability Engineering Handbook, Volume 2. Prentice Hall Inc, 1991.
79
[19] S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graph-
ical structures and their aapplication to expert systems. Journal of the Royal Statistical
Society, 1988.
[20] Silja Renooig Linda van der Gaag. Analysing sensitivity data from probabilistic net-
works. 2001.
[21] Dr. James Luxhoj. AOE 5984: Special Topics course on Risk Analysis for Aerospace
Systems, 2015.
[22] James T. Luxhoj. Probabilistic causal analysis for system safety risk assessments in
commercial air transports. Depertment of Industrial and Systems Engineering, Rutgers
University.
[23] James T. Luxhoj. Predictive analytics for modeling uas safety risk. SAE International,
2013.
[24] James T. Luxhoj. Special topics in risk analysis: Overview of the lauritzenspiegelhalter
(l-s) algorithm, January 2015.
[25] Kjaerul↵ U. Jenson F. Madsen A., Lang M. The hugin tool for learning bayesian
networks.
[26] Kevin Murphy. A brief introduction to graphical models and bayesian networks.
http://www.cs.ubc.ca/ murphyk/bayes/bnintro.html, 1998.
[27] Justin F. Murtha. An Evidence Theoretic Appraoch to Design of Reliable Low-Cost
UAVs. PhD thesis, Virginia Polytechnic Institute and State University, 2009.
[28] American Institue of Chemical Engineers, editor. Guidelines for Chemical Process Quan-
titative Risk Analysis. Center for Chemical Process Safety, 2000.
80
[29] Department of Defense. Mil-STD-1629A-ROCEDURES FOR PERFORMING A FAIL-
URE MODE, EFFECTS, AND CRITICALITY ANALYSIS, 1980.
[30] Department of Defense. Standard Practice for System Safety, 2012.
[31] NASA O�ce of Safety and Mission Assurance. Fault Tree Handbook with Aerospace
Applications. NASA, 2002.
[32] Judea Pearl. Probabilistic Reasoning In Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann Publishers Inc, 1998.
[33] James Reason. Human error: models and management. Education and Debate, 2000.
[34] Systems and Reliabilty Research O�ce. Fault Tree Handbook NUREG-0492. U.S Nu-
clear Regulatory Commision, 1981.
[35] Kevin Williams. A summary of unmanned aircraft accident/incident data: Human fac-
tors implications. Civil Aerospace medical Institute, Federal Aviation Administration,
December 2004.
81