UAS Risk Analysis using Bayesian Belief Networks: An ... · UAS Risk Analysis using Bayesian Belief Networks: An Application to the Virginia Tech ESPAARO Christopher G. Kevorkian

UAS Risk Analysis using Bayesian Belief Networks: An Applicationto the Virginia Tech ESPAARO

Christopher G. Kevorkian

Thesis submitted to the Faculty of theVirginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Master of Sciencein

Aerospace Engineering

Craig Woosley, ChairJames Luxhoj, Co-Chair

Pradeep Raj

August 5, 2016Blacksburg, Virginia

Keywords: ESPAARO, UAS, Bayesian Networks, Risk Analysis

UAS Risk Analysis using Bayesian Belief Networks

Christopher G. Kevorkian

Abstract

Small Unmanned Aerial Vehicles (SUAVs) are rapidly being adopted in the National Airspace(NAS) but experience a much higher failure rate than traditional aircraft. These SUAVs arequickly becoming complex enough to investigate alternative methods of failure analysis.This thesis proposes a method of expanding on the Fault Tree Analysis (FTA) method to aBayesian Belief Network (BBN) model. FTA is demonstrated to be a special case of BBNand BBN can allow for more complex interactions between nodes than is allowed by FTA.A model can be investigated to determine the components to which failure is most sensitiveand allow for redundancies or mitigations against those failures. The introduced method isthen applied to the Virginia Tech ESPAARO SUAV.

Contents

1 Introduction 1

1.1 Introduction of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Research Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Method Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 OOBN Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 FMECA Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Literature Review 8

2.1 Fault Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Basic Probability of Failure Calculations . . . . . . . . . . . . . . . . 10

2.1.2 Branching out on the Fault Tree . . . . . . . . . . . . . . . . . . . . . 13

2.1.3 Minimum Cut Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Event Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Failure Mode and E↵ects Analysis . . . . . . . . . . . . . . . . . . . . . . . . 21

iii

2.4 Failure Mode, E↵ects, and Criticality Analysis . . . . . . . . . . . . . . . . . 22

2.5 Bayesian Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.1 Basic BBN Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.2 Software and Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5.3 Object-Oriented Bayesian Networks . . . . . . . . . . . . . . . . . . . 33

2.5.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Summary and Comparison of Methods . . . . . . . . . . . . . . . . . . . . . 36

3 Bayesian Approach to Risk Analysis 39

3.1 Current Research using Bayesian Methods . . . . . . . . . . . . . . . . . . . 39

3.2 Fault Tree as a Special Case of Bayesian Methods . . . . . . . . . . . . . . . 41

3.3 Fault Tree Comparison to Bayesian Network . . . . . . . . . . . . . . . . . . 41

3.4 Identifying Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 Criticality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 FMECA representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 ESPAARO Case Example 54

4.1 Introduction to the ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Model Construction and Probability Elicitation . . . . . . . . . . . . . . . . 57

4.3 Conversion to an OOBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.4 Identifying Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

iv

4.5 Application and E↵ects of Mitigations . . . . . . . . . . . . . . . . . . . . . 66

5 Conclusions and Future Work 67

5.1 BBN for ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2 OOBN for ESPAARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3 Sensitivity Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

v

List of Figures

1.2.1 Bayesian Augmentation Approach . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 (a) AND gate (b) OR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2 Fault Tree Outline (Source: adapted from [16]) . . . . . . . . . . . . . . . . . 14

2.1.3 Fault Tree example (Source: adapted from [34]) . . . . . . . . . . . . . . . . 16

2.1.4 Minimum Cut Set Example (Source: adapted from [34]) . . . . . . . . . . . . 17

2.1.5 Communication network (Source: adapted from [18]) . . . . . . . . . . . . . 18

2.1.6 Communication Network Fault Tree, Denote Mirror Blocks (Source: adaptedfrom ([18]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 Event Tree Example (Source: adapted from [8]) . . . . . . . . . . . . . . . . 20

2.4.1 Risk Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.1 Simple Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.2 Serial Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.3 Influence Diagram Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5.4 Two Node Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.5 Probability Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.6 Conditional Probability of Influence Structures (Source: Adapted from [8]) . 31

vi

2.5.7 Wet Grass Example (Source: adapted from [26]) . . . . . . . . . . . . . . . 32

2.5.8 OOBN engine example as constructed in Hugin . . . . . . . . . . . . . . . . 35

3.2.1 Bayesian Expansion of Fault Tree . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Fault Tree Example (Source: [27]) . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.2 Bayesian Net Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Two node system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.4 Probability Tree of Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.5 CPTs of BBN example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.6 System Remodeled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4.1 Non-Boolean Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . 47

3.4.2 Graph of Likelihood Multipliers (LMs) . . . . . . . . . . . . . . . . . . . . . 49

3.5.1 Bayesian Example with Mitigation . . . . . . . . . . . . . . . . . . . . . . . 50

3.6.1 Modified NAVAIR Risk Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6.2 Example FMECA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.0.1 KEAS Lab Runway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1.1 ESPAARO Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1.2 ESPAARO Strut Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1.3 ESPAARO on the runway . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Control Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3.2 ESPAARO Aircraft OOBN . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4.1 Likelihood Multiplier Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

vii

5.1.1 ESPAARO BBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.1 Autopilot Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.2 Control Surface Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.3 Propulsion System Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.4 Control System Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.5 Strut Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.6 Tail Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.7 Wing Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.8 Aircraft Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.9 Camera Subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

viii

List of Tables

2.3.1 Example FMEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.1 Conditional Probability Table for Motor . . . . . . . . . . . . . . . . . . . . 29

2.6.1 Risk Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.1 Component Probabilities (Source: [27]) . . . . . . . . . . . . . . . . . . . . . 42

3.4.1 Sensitivity to Parameter Variation . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.2 Likelihood Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4.1 Abbreviated Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4.2 Likelihood Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5.1 Risk Mitigators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3.1 Sensitivity Analysis Results for ESPAARO . . . . . . . . . . . . . . . . . . 76

ix

Chapter 1

Introduction

Unmanned Aircraft Systems (UAS) have become a vital part of the U.S military arsenal

and are quickly growing in commercial popularity. Unfortunately, all but the most advanced

UAS su↵er from remarkably high rates of failure. Decade old data shows that UAS systems

experience a failure rate nearly 100 times greater than manned military aircraft [35]. Even

though UAS su↵er such high rates of failure they have been operating more sorties than

manned missions in recent conflicts [1]. Small UAS designs tend to be very focused on

maintaining low cost, which drives the design to be particularly failure prone. Currently

many airframes are deemed ‘expendable’ or ‘frangible’ when utilized on closed ranges, and a

detailed system safety review may be unneeded if the system poses little risk to personnel or

equipment on the ground. With the recent proliferation of small UAS and the widespread

interest in the commercial sector, system safety must be addressed, as these systems may

now operate in civil airspace with people and property under their flight path and with civil

and commercial aircraft sharing the airway.

1

Large aircraft manufacturers invest substantial time and resources to address system reli-

ability; however, small UAS designers cannot a↵ord to commit similar time and resources.

New techniques need to be developed to assist the designers of smaller UAS to assess their

airworthiness and address possible problems before they occur.

One approach to evaluating an aircraft’s reliability is to assess the probability of failure

through fault tree analysis (FTA). A fault tree is a directed acyclic graph consisting of

boolean blocks representing components interacting in a system. In failure analysis, the

“top event” of the fault tree is an overall system failure (e.g., loss of an aircraft or mission

failure). Failures of underlying subsystems cascade up the tree and may or may not result

in the top event, depending on the fault tree structure (reflecting component redundancy,

etc.). Fault tree analysis is an e↵ective tool for predicting system unreliability, however it

requires knowledge of component failure rates. This information is very rarely known for

small UAS making FTA less e↵ective for assessing safety for this class of aircraft. Also, the

tree structure is restrictive; for example, it can be di�cult to represent interdependencies

between “parallel” events in the structure.

1.1 Introduction of the Problem

The parts and processes used to manufacture many small scale UAS platforms have been

spun o↵ from the hobby community. While many of these processes are widespread and have

been iterated in a cost competitive market landscape to find e↵ective solutions to problems,

little reliability data exists for such products. The emphasis on low prices and quick time

to market, has led to questions about reliability and quality of the product. The sparsely

available data makes any consistent reliability analysis di�cult [5]. US military UAS are

2

an exception; the military requires that components meet MIL-STD-100C. In practice, it is

often desired to keep the cost of UAS low and this focus on cost reduction has predictable

consequences for system safety and reliability. The need to keep the cost low while the

diversity and short product life cycle of UAS designs have created a unique problem for

system safety and reliability. This problem is highlighted by the fact that many UAS systems

are seen as disposable, which goes against conventional reliability practices. Unlike aircraft

UAS lifecycles may only last a few months. Aircraft certification requires a very well tested

and known platform. A certification process for UAS is still developing for the commercial

industry, however has begun to be outline in the FAA Part 107 [4]. As these systems begin

to be utilized in the National Airspace System (NAS) such standards will begin to solidify.

A standardized and adaptable system must be found to ascertain, and then improve, the

airworthiness of small UAS at reasonable cost.

UAS su↵er from a relatively high mishap rate compared to manned aircraft, which is a

reason the FAA has been reluctant to allow their use, or at least their commercial use, in

civil airspace. At this time it is unreasonable that small UAS will reach a level of safety

equal to large and expensive aircraft but an equivalent level of safety must be determined

[12]. Once those new thresholds are determined techniques that are appropriate to small,

lower-cost aircraft must be developed and applied to ensure safety.

The current approaches to system safety commonly utilized by air safety authorities, such as

the Federal Aviation Administration (FAA) and NAVAIR are very comprehensive for manned

aircraft. For instance the FAA has adopted the Safety Management System (SMS) which is a

formal, top-down systematic approach to identify risk, including the necessary organization

structures, policies, and procedures [6]. The SMS entails four main components: Safety

Policy, Safety Assurance, Safety Risk Management, and Safety Promotion. Of particular

3

interest in this work is the Safety Risk Management section, which briefly describes various

techniques of identifying and evaluating hazards. The most widely used techniques include

Failure Mode and E↵ect Analysis (FMEA), Failure Modes E↵ects and Criticality Analysis

(FMECA), and Fault Tree Analysis (FTA). These techniques will be discussed in more detail

in Chapter 3.

The Navy introduced similar guidelines to the FAA in managing risk in their NAVAIR

Risk Assessment Handbook [9]. These guidelines provide a focus not only for assessing and

managing the risk to the system, but also provide a thorough framework for risk reporting

and mitigation. A supplement to the Risk Assessment Handbook, NAVAIRINST 5000.21A,

provides more specifics on implementation of risk analysis as it pertains to the acquisition

process. These guidelines still base their system risk assessment on techniques such as FTA

and FMECA. Similarly the FAA has a FTA and FMECA based approach to risk assessment.

The most commonly used method for a system level of risk analysis is a Fault Tree Analysis.

This method is very well understood and has historically served very well. However the

application of this method imparts a series of restrictions on how systems are modeled.

Many systems have complex interactions between hazards and their e↵ects. A fault tree

approach to modeling these hazards requires that all hazards have unique e↵ects and be

statistically independent of one another, as well as only two condition states [28].

1.2 Research Objectives

To overcome the limitations of the fault tree, a Bayesian Belief Network (BBN) is used. A

BBN is a graphical representation of a probabilistic dependency model and can be seen as

4

generalized form of a fault tree [16]. A direct comparison is performed between a fault tree

and a BBN. A comparison is made with an expanded BBN to investigate the features and

their benefits of a Bayesian approach to system risk assessment.

Bayesian networks can consist of subnetworks represented as objects to create an Object Ori-

ented Bayesian Network (OOBN). Using an object oriented approach yields system represen-

tations that are more accessible to a user or decision-maker. Object oriented representations

also encourage modularity, which supports reuse in future development e↵orts.

The process of building a Bayesian Network is similar to a FMECA/FMEA in analyzing

failure modes and the higher level e↵ects of those failures. The criticality analysis of FMECA

can be augmented with the BBN to provide realistic system risk reduction of mitigations as

well as providing cost/benefit to those decisions. The approach is illustrated in Figure 1.2.1.

Figure 1.2.1: Bayesian Augmentation Approach

5

1.3 Research Tasks

1.3.1 Method Comparison

After the methods of risk assessment are introduced, the BBN fundamentals are explained.

A comparative assessment between BBN and FTA is employed to demonstrate how FTA can

be recovered with a BBN approach to the same problem. Both are directed acyclic graphs

which makes their comparison natural.

1.3.2 OOBN Representation

With a comparative assessment between FTA and BBN, a broader structure of Bayesian

Networks is used to show how large and complex system can be fully captured without

a top level diagram showing this complexity. An OOBN is used to model a UAS system

developed at Virginia Tech. An OOBN allows subnets to be created that represent individual

subsystems which are fed into the top level system.

1.3.3 FMECA Augmentation

A prototype method is introduced to support decisions made to mitigate the risks of the

system. This method illustrates the use of an OOBN on an actual UAS and will augment

the current FMECA approach. The natural method of creating a FMECA and determining

the severity of hazards lends itself to the Bayesian approach that can also show the e↵ects

of hazards and any mitigation that may be introduced to reduce them.

6

1.4 Overview of the Thesis

Chapter 2 provides a literature review on the current methods and motivations for small

UAS risk analysis as well as their advantages and limitations. Chapter 3 progresses to

a detailed introduction of the method referenced in Figure 1.2.1 using a simple fault tree

analysis introduced by Murtha [27]. Chapter 4 includes a case study using a Virginia Tech

UAS to populate an OOBN and then perform risk analysis. Chapter 5 summarizes this work

and suggests future areas of research.

7

Chapter 2

Literature Review

The purpose of risk analysis is to determine likelihood and consequences of undesirable events

in a system. Knowledge of a system may be obtained by analysis or experimentation, but

knowledge of the system will always remain uncertain. In the vast majority of cases it is

not possible to acquire all relevant information to remove all elements of system uncertainty.

However, risk analysis must be performed even if all relevant information about failure

probabilities is not available. Vesely [34] notes an important observation that to make a

decision, a perception of reality must be created, and that this perception must be as close

to actual reality as possible in order to make the best decision possible. More information

can always be learned about a system through analysis or experimentation but a method to

represent it be must be as accurate representation as possible.

There are a variety of methods to assess risk of an undesired event in a system, both quan-

titative and qualitative. Qualitative risk assessment uses expert judgment to evaluate the

probability and consequence of given events. If subject matter experts are available this is

8

often viewed as a su�cient method to analyze risk, albeit subjective. A quantitative ap-

proach to risk analysis relies on probabilistic and statistical methods, including databases

that have identified probabilistic information for failure rates and their e↵ects. A quantita-

tive approach is generally regarded as a more detailed approach if the information is available

[8].

The decision of which method to create a system risk model is determined largely by the

availability of probabilistic data and the level of analysis required. Qualitative methods o↵er

analysis based on subjective process may result in large errors. A quantitative, or data driven,

analysis generally provides a less subjective understanding individual judgement but requires

high quality data for more accurate results. Usually a combination of both quantitative and

qualitative analyses is utilized [8].

When it comes to selecting a means of safety analysis, the SMS [6] recommends considering:

information available, timeliness of the information, time required for analysis, and tools

that will provide the appropriate approach for fully identifying the hazards and their e↵ects.

In section 3.3.4.3 of the SMS [6] it recommends to “select the methodology that is most

appropriate for the type of system being evaluated”. This statement allows the method that

is best able to capture hazards and e↵ects to be chosen.

2.1 Fault Tree Analysis

Fault Tree Analysis (FTA) is a method of performing quantitative risk analysis and was first

introduced as a safety assessment tool of the Minutemen Intercontinental Ballistic Missile

systems by Bell Labs in the 1960s. Fault trees are graphical models created by a deductive

9

top down process combining events and occurrences tracing from some top level failure

event back to its causing factor. It is typical for only top level events that are deemed to

be catastrophic to be included. Fault trees are often constructed by identifying a top level

failure and then deducing its cause by a combination of factors. Root failures can combine

to result in a higher level failure. This cause and e↵ect logic is represented graphically by

a series of standardized symbols introduced by the Nuclear Regulatory Commission [34]. It

is important to note that the fault tree approach is not an exhaustive list of factors leading

to failure, but only those failures that are determined to be pivotal in a particular case of

failure [31].

2.1.1 Basic Probability of Failure Calculations

Fault Trees are built up using a standardized system of symbols. Boolean AND and OR

gates aggregate the lower level events to connect events to their higher level e↵ects. Figure

2.1.1 shows an example of an AND gate (a) and an OR gate (b). The top event is represented

as T, while A and B are both base components. In Figure 2.1.1a, T can be said to occur

only if A and B occur. Applied to risk analysis, this figure represents a redundant system

for which the failure of a single component (A or B) does not result in a failure of the system

(T). The failure probability of this simple system can be represented as

pT = pA ⇤ pB (2.1.1)

where pA and pB are the failure probabilities of the corresponding components. Figure 2.1.1b

shows

10

Figure 2.1.1: (a) AND gate (b) OR gate

an OR gate that represents the outcome T if A or B occurs. It is worth noting that an OR

gate can be represented in two ways, depending on A and B are mutually exclusive. If two

events are mutually exclusive then either A or B could occur, but not both. As an example

a coin cannot be both heads and tails in a single toss. In this coin case, the probability of

the system is represented as

pT = pA + pB (2.1.2)

When A and B are not mutually exclusive the system can be represented as

11

pT = pA + pB � (pA ⇤ pB) (2.1.3)

This extra term, pA ⇤ pB, in Equation (2.1.3) comes from the chance that both A and B

fail simultaneously. Without this extra term the probabilities would be doubly counted.

According to the NASA FTA Handbook [31], if the probabilities of either the A or B terms

are below 0.1 it is acceptable for the second term in Equation (2.1.3) to be ignored. This

simplifies the OR gate in Equation (2.1.2), and is known as the rare event approximation

[31]. The rare event approximation is certainly not needed for the simple example given

above. For analysis of a complicated system model, however, the approximation can save

considerable computation time.

A system of n components connected through an AND gate can be seen in Equation (2.1.4)

where C is a cause.

pT =nY

i=1

p(Ci) (2.1.4)

Equation (2.1.5) represents an OR gate of larger size than a two component system.

pT = 1�nY

i=1

(1� p(Ci)) (2.1.5)

The rare event approximation normally allows Equation (2.1.5) to be simplified down to

Equation (2.1.6) [13].

12

pT =nX

i=1

p(Ci) (2.1.6)

2.1.2 Branching out on the Fault Tree

It is common for fault trees to be built up using aggregates called “contributors” that intro-

duce terms similar to pT derived from Equations (2.1.4) or (2.1.6). This system probability,

pT , can then be fed into the next calculation iteratively instead of forming a long alge-

braic equation to represent the entire system [16]. This method of substitution allows for

contributor probabilities of failure to be known, providing subsystem probabilities of failure.

Figure 2.1.2 shows an outline of a generic fault tree, and it can be determined that pA = p1⇤p2

and that pB = p4 + p5 so long as p4 and p5 are less than 0.1. Therefore the equation that is

represented by the fault tree of Figure 2.1.2 can be expressed as

pF = pA + p3 + pB

instead of the longer form and more accurate form of

pF = (p1 ⇤ p2) + p3 + p4 + p5

Thus, one may decompose the computation of the top level failure probability into summable

elements. Equations (2.1.6) and (2.1.4) are used in a series of steps to compute the overall

failure probabilities.

13

Figure 2.1.2: Fault Tree Outline (Source: adapted from [16])

14

2.1.3 Minimum Cut Set

Another common technique used to reduce the size and complexity of a fault tree are min-

imum cut set. An illustration of a minimum cut set is shown in Figure 2.1.4. A cut set is

a series of basic events that combine for the higher level event to occur. Then a minimal

cut set is the smallest combination of component failures that if occurring cause the top

event to occur [34]. By definition this implies the “smallest” combination of failures, which

could be a one component failure, or two comprising a double component failure. It is often

desirable to determine the minimum cut sets from a fault tree to best ascertain the e↵ects

and interactions of basic failures. The Nuclear Regulatory Commission Fault Tree Handbook

[34] has an example of this process using Figure 2.1.3, where T is the top level event and (A)

(B) (C) are basic events; Ei are seen as contributors. Such a tree is unnecessarily complex

and has multiple instances of the same basic events which adds unnecessary mathematics.

One way to reduce the complexity is through this minimum cut set method.

This fault tree is an example of a dual pumping system for a nuclear reactor and represents

its operation. To represent this fault tree algebraically the following equation is obtained

from the fault tree.

T = A ⇤ C + (B ⇤ C) + (C ⇤ C) + (A ⇤B) ⇤ A+ (A ⇤B) ⇤B + (A ⇤B) ⇤ C

After applying the rules of boolean algebra and without the rare event approximation, it can

be reduced to the simple and equivalent form

T = C + A ⇤B

15

Figure 2.1.3: Fault Tree example (Source: adapted from [34])

16

This represents the minimal cut set for the system, which shows a single component failure

and a double failure [16]. This allows an equivalent fault tree to be constructed as seen

in figure 2.1.4. Both trees have the same minimal cut sets and are numerically equivalent,

having the same system level rate of failure.

Figure 2.1.4: Minimum Cut Set Example (Source: adapted from [34])

The minimal cut set is a useful tool for determining if a combination of events will cause

the top level event to occur and for reducing the complexity of a fault tree. It especially

highlights the most undesirable combinations; however, in the presence of a large number of

AND and OR gates the algebra can become complicated [31]. A common problem with this

approach is the multiple e↵ects of certain events, that will not only be seen in the original

fault tree but also in any cut set of that same fault tree. For instance, examine Figure 2.1.5

which represents a communication network presented by Kececioglu [18]. Event C has two

e↵ects, and event A has three. If this were represented in a fault tree multiple events would

have to be presented as multiple nodes in the fault tree.

The minimal cut set approach would be used to simplify any fault tree that was presented,

17

Figure 2.1.5: Communication network (Source: adapted from [18])

with the cut sets being: {A}; {G}; {E, F}; {B, C, F}; {B, C, D}; {C, D, E}. A fault tree

built from these cut sets clearly has multiple instances of the same nodes and overcomplicates

a relatively simple network, as seen in Figure 2.1.6. These multiple instances would create

identical causal nodes that would occur at di↵erent locations in the fault tree, and these nodes

are seen as events with a common cause of failure and having multiple e↵ects. This problem

occurs because the AND/OR gates in the Fault Tree only allow series or parallel interactions

of components. The multiple instances of these nodes are identical and dependent on one

another, not seperate events. Kececioglu introduces Mirror Blocks to handle this event [18].

These Mirror Blocks are shown in Figure 2.1.6 with a small black box next to them as

designation. This cosmetic fix does little to actually simplify the design of the fault tree,

and arguably by adding more nomenclature, it further complicates it.

Fault Trees have some deficiencies but are very easily applied to a large number of systems.

They have been widely adopted by industry as a way to identify and reduce causes of failure

and to graphically depict components in a system that may be otherwise hard to visualize.

18

Figure 2.1.6: Communication Network Fault Tree, Denote Mirror Blocks (Source: adaptedfrom ([18])

2.2 Event Tree Analysis

Another method used for system safety analysis involves an Event Tree. Analysis of an

event tree is often used to determine successful operation of components that depend on a

chronological set of events [8]. An Event Tree is an inductive method that captures the e↵ects

of an initiating event and works well at showing the progression of sequential events and their

possible outcomes. One event will cascade its e↵ects to the next event and then its e↵ects

to subsidiary events in line. An initiating event typically has a success and failure condition,

with their probabilities summing to one. A success condition for an event is defined as the

occurrence of that event. This system allows the probabilities to be cascaded down from the

initiating event to the failure events. The main di↵erentiator between an Event Tree and a

Fault Tree is that Event Trees investigate both success and failure scenarios while a Fault

Tree does not. The structure of an event tree is similar to that of a probability tree and can

19

be seen in Figure 2.2.1.

Figure 2.2.1: Event Tree Example (Source: adapted from [8])

Each chance node is associated with a probability of operation given the previous event.

Bilal [8] introduces an example of a pump operated to extinguish fires. Suppose that the

probability of the pump operating correctly (PO) is given as 90% and the inverse (PO*) is

10%. Given that the pump is functioning (FE) the probability of the fire being extinguished

is 40%. If a fire occurs the probability that the fire is extinguished can be represented as the

product of these nodes.

PPS = PPO ⇤ PFE = 0.90 ⇤ 0.40 = 0.36

Therefore the probability that the fire is extinguished is 36%. This provides a very quick

method for investigating simple systems and analyzing their sequence of events and how

possible hazards are a↵ected by initiating events. Such trees can be created with di↵erent

initiators to show di↵erent outcomes or consequences [8].

20

Event trees are widely used in many fields and their popularity is due in part to their

simplicity and the holistic picture of risk and reward association [18]. As a system grows in

components so do the events, which can cause Event Trees to become confusing.

2.3 Failure Mode and E↵ects Analysis

Another common tool to evaluate a system’s reliability and failure modes is Failure Mode

and E↵ects Analysis (FMEA). This is a deductive method of assessing a system’s reliability

as a whole to determine what the e↵ect of each failure mode is and how damaging it can be.

The process for performing a FMEA was introduced by the Department of Defense in the

1970s in Mil-STD-1629A [29]. The FMEA has become an integral approach to analyzing

system reliability; however, it tends to focus on the much more severe failure modes [8].

These potential failures are ranked in terms of importance and ability for a corrective action

to be implemented [18]. Failure modes in this context usually refer to a specific event or

state that occurs that will have damaging e↵ects to the system and in this process the worst

possible e↵ects are considered. The failure e↵ects are those negative events that occur when

the failure mode occurs. If these failure modes are able to be identified, failure detection

methods and corrective actions are desired to be put in place. The FMEA process provides

a documentation method for including all of these steps in the form of a work sheet as

exemplified in Table 2.3.1 which is an example of the Mil-STD-1629A [29] format.

An assumption in the FMEA process is that a failure mode may or may not be the root cause

of a higher level failure, root cause being the base failure event. This implies that the higher

level failure does not necessarily have to occur if the root cause occurs. This assumption is

a departure from some other failure analysis methods, which imply that if a failure mode

21

ID

Number

Item Function Failure

Mode

and Cause

Mission

Phase

Failure E↵ects

Example Engine Propulsion Engine Fire All Fire condition,

damage to

components,

loss of

propulsion

Failure

Detection

Method

Compensating

Provision

Severity

Class

Remarks

Fire

detection

system

Fire

Suppression

system

1

Table 2.3.1: Example FMEA

occurs, its higher level contributors also have to occur [8]. The opposite can also be stated,

that a failure e↵ect does not have to occur from the root cause identified, but could have a

separate root cause that still needs to be identified. All root causes of higher level e↵ects

may not be identified, however the most likely causes are known and listed in the FMEA

worksheet.

The FMEA process provides a methodology to identify failure modes and a process for

applying corrective action while examining the larger system e↵ects [18]. The process is very

broad and has been adopted in many di↵erent ways to improve product quality and safety.

2.4 Failure Mode, E↵ects, and Criticality Analysis

A Failure Mode, E↵ects, and Criticality Analysis (FMECA) is a more in-depth model of

a FMEA in which some form of evaluation is performed on the risk associated with the

22

potential problems that have been identified. Two common methods are the Risk Priority

Number (RPN) and a Criticality Analysis, both adding what can be seen as a risk level

for potential failures. The RPN is a product of three elements, the severity, likelihood of

occurrence, and likelihood of detection. This can then be used to compare failure events and

prioritize them to be mitigated.

The Criticality Analysis is very similar to an RPN analysis, outlined in MIL-STD-1629A,

Section 102 [29], and describes both a quantitative and qualitative method for performing

analysis. The quantitative approach of a criticality analysis requires that for each failure

mode three things must be determined, (1) the probability of failure of the components �P ,

(2) the probability that the failure mode being investigated is actually the cause of mission

failure ↵, and (3) the probability that this failure mode will result in mission failure �,

commonly known as severity. The criticality for that specific failure mode is denoted as Cm

and is given as

Cm = �↵�P

The severity classification and probability levels are typically defined as categories from

either verbal or numerical charts. Many of the category classifications used for the U.S

Federal Aviation Administration, NASA, or the U.S. Military are derived from MIL-STD-

882 [30, 29, 9].

The qualitative approach to evaluate risk involves both rating the severity of potential e↵ects

of a failure, and the likelihood of occurrence for each potential failure mode. These two factors

are then compared in a risk matrix with severity on the horizontal axis and likelihood on

the vertical axis as can be seen in Figure 2.4.1. This approach is much less involved and it is

common to see it with slight variations. This approach is recommended for risk assessment

23

in the NAVAIR Risk Assessment Handbook [9].

Figure 2.4.1: Risk Matrix

Once the criticality analysis is performed, columns for the likelihood, severity and criticality

number are added to the FMEA table seen in Table 2.3.1 to generate a FMECA. The

FMECA process can be easily tailored to specific applications, making it a popular approach

for risk analysis. It is also relatively simple to implement and can be as simple as an Excel

spreadsheet [18].

2.5 Bayesian Belief Networks

Bayesian Belief Networks (BBNs) are a form of probabilistic modeling that represent a

system as a series of random variables and the dependencies among them [8]. BBNs are

related to Fault Trees in that both are directed acyclic graphs. Many methods cannot handle

complex dependencies or uncertainty. Jenson explains how BBNs excel when expert opinion

24

is ambiguous, incomplete or uncertain [17] . The prior methods discussed in this chapter

were either deductive or inductive; a BBN is abductive. Abductive reasoning can be seen as

the “inference to the best explanation” [2]. The previous methods were also qualitative or

quantitative while a BBN can be either [7]. A qualitative approach to BBNs will construct

a Causal Network to graphically represent variables and the relationships between them. A

BBN as a quantitative method will use a form of probability calculus, Bayesian calculus, to

represent variables and their interconnections.

As mentioned briefly above, Bayesian Belief Networks are graphical structures that use

probabilistic reasoning to ascertain information about the unknown. BBN variables are

often called nodes and the relationships that connect them are termed arcs. An arc between

two nodes implies those two are conditionally dependent on one another, while the absence

of that arc implies conditional independence [8]. This definition indicates that if two nodes,

from A to B, are connected by an arc, A is seen to cause B and this informs us that they

are conditionally dependent. There is a similar interpretation if B influences C. However, if

there is no arc connecting two nodes, as is the case in Figure 2.5.1, A and C are conditionally

independent. If the variables A and C are conditionally independent given B, as in Equation

(2.5.1), this means that if B is known there is no knowledge of C that will alter the probability

of [17]. The term P (A|B) represents the probability of A given that B is known to have

occurred. If evidence of B is given, B is said to be instantiated. The directed nature of

the models attaches certain deterministic information to the model that will update the

probabilities as information is elicited [26].

P (A|B) = P (A|B,C) (2.5.1)

25

Figure 2.5.1: Simple Bayesian Net

The same interpretation of conditional independence still applies between the variables A

and C given B in a serial connection as seen in Figure 2.5.2. These links signify the direct or

causal dependence and the influence between variables, for instance PA ! PB; where PAis

the probability of A occurring. When there is a direct link from one variable to the next,

Pi ! Pj. Pi is called a parent, and Pj is the child of Pi. These connections seen in Figures

2.5.1 and 2.5.2 create influence diagrams.

Figure 2.5.2: Serial Bayesian Net

2.5.1 Basic BBN Methods

The causal network is represented graphically as an influence diagram, with a series of

connected variables. How those variables are connected can vary depending on the needs

of the model. Influence diagrams can have four basic structures: diverging, converging,

26

serial, or a hybrid of these. An example of these can be seen in Figure 2.5.3. A diverging

connection, also seen in Figure 2.5.1, shows the parent directly influencing both children

nodes. If A is known, or instantiated, both B and C cannot e↵ect one another due to their

independence. The same can be said of a serial connection; if B is instantiated, neither A

nor C can influence one another. In a converging connection, as seen in Figure 2.5.3, the

parents are initially independent of the child. If C is instantiated then the parents become

dependent upon C.

Figure 2.5.3: Influence Diagram Structures

A BBN allows for non boolean events to occur, which allows a simple method to account

for unknowns and deal with numerical uncertainty. Bayesian methods provide a way of

reasoning about partial beliefs under conditions of uncertainty [32]. A random variable,

designated in a BBN as a node, can be discrete, continuous, or mixed. The states of these

nodes must be mutually exclusive in order for uncertainty to be determined [7]. When the

nodes of a BBN are defined as a discrete random variable and have a finite set of possible

values, the BBN is called discrete.

In probability calculus, a conditional probability is used to achieve updated probabilities

when events are instantiated. For instance, “the probability of event A given B is x” or

27

P (A–B), which can in turn be written mathematically as P (A|B) = x [7]. If B is determin-

istic then this will become P (A|B) = P (A), which shows us that A and B are independent.

If P (A|B) = P (A|B,C), then A and B are said to be conditionally independent given C.

This is introduced graphically in Figure 2.5.1 [32]. This can be further expressed in Equation

(2.5.2), where P (A,B) denotes P(A) conditionally independent of P(B) when P(B) 6= 0

P (A|B) =P (A,B)

P (B)(2.5.2)

Bayes theorem is derived from this definition of conditional probability and is seen in Equa-

tion (2.5.3). P(A) and P(B) are independent probabilities, and P (A|B) is the probability of

A when B is instantiated; and P (B|A) is the probability of B given that A is instantiated

[32]. P(A) is known as a prior probability and P (A|B) is known as a posterior probability.

P (A|B) =P (B|A)P (A)

P (B)(2.5.3)

Unlike in a fault tree, a variable can have more than two states. So if variable B has

numerous and mutually exclusive states b1, b2...., bm, then P (A|B) can be represented by

an n ⇥ m matrix of entries P (ai|bj) where the columns sum to one,Pn

i=1 P (ai|bj) = 1 for

j = 1, ....m [7]. This matrix is termed the conditional probability distribution (CPD) of

variable A. The CPD matrix produces entries of P (ai|bj)P (bj) = P (ai, bj), where P (bj) is

the probability of B being in the state bj [7].

It is helpful to examine this graphically with a simple two node system, that can exemplify

nodal interactions on a larger scale. A two node example is introduced in Figure 2.5.4.

28

Figure 2.5.4: Two Node Example

Node B represents a motor’s operational state and node A a thermal state [21]. There is

a 75% likelihood that it is running at peak e�ciency so a 25% chance that it is not. If

the motor is running e�ciently there is a 30% likelihood that it is overheating; if it is not

running e�ciently there is an 80% likelihood it has overheated. In this example the prior

probability that the motor is running at peak e�ciency is 75%. However, if it is observed that

the motor has overheated, we can update this via the posterior probability. A Conditional

Probability Table (CPT), as seen in Table 2.5.1, shows the likelihood of each scenario. The

joint probability of each condition is simply the product of the corresponding CPT with

each parent node’s probability. A CPT is a representation of the conditional probability

distribution when the variables represented are discrete [26].

Peak E�ciency E�cient Not E�cient

Overheat 0.3 0.8Not Overheat 0.7 0.2

Table 2.5.1: Conditional Probability Table for Motor

A graphical way of visualizing these scenarios is a probability tree, as seen in Figure 2.5.5.

The posterior probability is the probability of B given it is known that A has occurred. This

allows the model probability to be updated. In the case of this example, if it is known that

the motor has overheated, there is a 53% likelihood that it is running e�ciently.

P (RunningE�ciently—Overheated) =(0.75 ⇤ 0.3)

((0.75 ⇤ 0.3) + 0.2)= 0.53

29

Figure 2.5.5: Probability Tree

The above example is only for a simple two node serial interaction and the four basic nodal

interactions that were introduced earlier can be expanded upon to show the dependencies

between parents and children [8]. Figure 2.5.6 represents this extension and shows the

equations that are necessary to build the conditional probabilities for each type of structure.

To expand from a simple system, Murphy introduces a four node system [26]. Each of the

four nodes in this system being either True or False and denoted by a T or F. The Conditional

Probability Distribution is shown as a CPT for each node, listing the probability that each

child node will take each of its di↵erent values and combination of those values of its parents.

Figure 2.5.7 shows an example of why the grass is wet. The event of the grass being wet can

either be True or False, as the grass will either be wet or it will not. Two events could cause

the grass to be wet, as the sprinkler can be on (S=True) or it is raining (R=True), or it can

30

Figure 2.5.6: Conditional Probability of Influence Structures (Source: Adapted from [8])

also be both. It is clear from the CPT for wet grass that if either the sprinkler is on or it is

raining then the grass will likely be wet.

If nothing is known about this system it is simply straightforward to go through the calcu-

lations introduced earlier in this section to obtain a likelihood that the grass will be wet.

With no evidence given, the likelihood of the grass being wet is 72.9%. If it is observed

that the grass is wet this will update the posterior probabilities of the rest. With the grass

being known to be wet, the likelihood of it being cloudy increases to 53.3% and there is a

63.4% chance that it has rained, while the sprinkler on has only a 51.6% likelihood. If it

can also be observed that the sky is not cloudy yet the grass is also observed to be wet, we

can update the posterior probabilities further and say that there is a 86.7% likelihood that

the sprinkler was on. The act of observing data to eliminate other data points is a method

known as Explaining Away [26].

31

Figure 2.5.7: Wet Grass Example (Source: adapted from [26])

32

2.5.2 Software and Algorithm

For simple systems, it is convenient to analytically expand the mathematical equations and

nodal interactions. As a system becomes larger and more complex it is advantageous to

leverage computing power to populate the probabilities of the BBN. The Hugin Expert BBN

software uses a graphical user interface to show nodal interactions, one that will be used

later in this thesis. Hugin Expert BBN software allows the construction of these BBNs to

combine data and subject matter expert knowledge [25]. The tool also allows for parameter

estimation and analysis from generated BBNs. The networks are constructed as probabilistic

models and influence diagrams which were discussed in the previous section.

The Hugin software uses an embedded algorithm developed by Lauritzen and Spiegelhalter

[19]. This algorithm provides local computations with probabilities on graphical structures

and allows their application to custom systems [19]. An expert system is a computer pro-

gram intended to make reasoned judgments about a complex system with minimal outside

judgment [19]. The algorithm provides an e�cient method of exact probability inference

in an arbitrary BBN [24]. The Lauritzen-Spiegelhalter algorithm works in two steps. First

by creating a tree of cliques from the original Bayes network, a clique being an undirected

graph subset with every two of its nodes connected by an arc. Then the probabilities for the

cliques during a message propagation and the individual node probabilities are calculated

from these probabilities of cliques [24].

2.5.3 Object-Oriented Bayesian Networks

Bayesian networks do well at depicting complex relationships between events by simplifying

their representation. However, as systems become larger with an increasing number of arcs

33

and nodes it is desired to often reduce the complexity of the system without losing depth of

information. A common technique used in programming is to create Classes or Objects that

encompass a set of attributes to that object. BBNs can capitalize on a similar technique

using subnets to capture a set of logic gates and variables into a simple representation of

an object that outputs to a higher level network. Bayesian networks utilize this approach

as introduced by Koller [11] called Object-Oriented Bayesian Networks (OOBNs). These

OOBNs allow a framework for large and complex systems to be represented clearly and

e↵ectively.

The basic element of an OOBN is the object, with the most basic object being a standard

random variable [11]. An OOBN requires certain nodes to be represented as Inputs, which

would come from other models, or subnets, that have outputs. An object in an OOBN can

represent many variables and can be seen as its own Bayesian network. For example a fuel

injection system could be represented as a Bayesian network and then later be pulled into

a larger OOBN that represents the engine. The engine network and fuel injection network

would both be their own networks, and would be linked together by the input and output. A

graphic of this OOBN can be seen in Figure 2.5.8. Expanding on the fuel injection system,

an engine may have more than just one fuel injector, and the output of that system may be

led to multiple inputs on the engine OOBN. This approach organizes the information into

layers which may be visualized more easily.

2.5.4 Sensitivity Analysis

An advantage of BBNs is the propagation of evidence outlined in the Lauritzen-Spiegelhalter

algorithm [19]. This allows the following question to be answered about a given BBN: How

sensitive is the outcome to variations of the node? This question can be answered to better

34

Figure 2.5.8: OOBN engine example as constructed in Hugin

determine the e↵ects of a given parameter on other events of the network or even a top level

event. Many of these parameters are imprecisely specified in a model and can be a possible

source of error. The most influential parameters should be identified and e↵ort should be

directed towards reducing their e↵ects. The process of identifying these parameters and

analyzing their negative e↵ects on other probabilities in a model is known as a sensitivity

analysis [15].

For example, consider a two node section of a BBN, between a parent node B, and a discrete

child node A. To investigate using sensitivity analysis, let a be the state of A. With evidence

", the probability P (A = a|") can be seen as a function of the conditional probabilities in a

CPT of the two nodes. Let B be a discrete node in the network and subject to a conditional

parameter x, which can be seen as an input parameter. B is also in state b and subject to

an input. From van der Gaag [20], the probability of evidence " is a linear function of x,

P (")(x) = �x+ �. Therefore the joint probability of the evidence and the event “A = a” is

also a linear function of x, P (A = a, ")(x) = ↵x + �. The sensitivity function is defined as

follows:

35

P (A = a|")(x) = ↵x+ �

�x+ �

(2.5.4)

The parameters of this function (↵, �, �, �) are determined from assessment of the parame-

ters that are not varied. These constants can be feasibly determined from the network by

computing the probability of interest for a small number of values for the parameter under

study and solving the resulting system of equations [20].

2.6 Summary and Comparison of Methods

There are many di↵erent methods to determine the risk of a system and perform risk analysis.

Only a few of the more popular methods were discussed in this chapter. Table 2.6.1 provides

a brief comparison of these methods.

Clearly identifying failures and their e↵ects involves analyzing the system as a whole. As

systems become larger and more complex it is easier to overlook hazards or how they can

a↵ect a system. It is not only crucial to identify all of the potential hazards of a system, but

also their e↵ects. The goal of a risk analysis study is to provide the best possible picture

or understanding of the potential hazards. It is sometimes ideal to combine methods of risk

analysis in order to provide this picture, as each method has its strengths and weaknesses.

Of the methodologies discussed all provide significant contributions to risk management,

however BBNs are underutilized. They provide a means of dealing with two problems that

commonly occur, uncertainty and complexity [7]. BBNs influence diagrams provide an ob-

jective and compact visible way to show the interactions of e↵ects for decision making in risk

36

Analysis Tool Advantages Limitations

Fault Tree Analysis(FTA)

Easily determinedprobabilitiesEasily determinesundesired events

Limited event interactionsEvents statisticallyindependentAll failure events must beknownboolean failure events

Event Tree Analysis(ETA)

Easily determinedprobabilitiesMultiple results analyzed

implies serial eventinteractionEvents statisticallyindependentAll failure events must beknownBoolean failure events

Bayesian BeliefNetworks (BBN)

Allows for numerous causalinteractionsAbductive modelNumerous event statespropagation of evidence

Requires CPTsComplex nodal interactionsMath can be complexwithout computers

Failure Mode andE↵ects Analysis

(FMEA)

Useful documentation ofsystem e↵ectsExamines every component

Rigorous detailDoes not include multiplefailuresExpansive for large systems

Failure Mode andE↵ects CriticalityAnalysis (FMECA)

Expands FMEAUseful for mitigation e↵ects

More analysis requiredLabor intensive

Table 2.6.1: Risk Analysis Methods

37

management. They also provide a way for updating models when new evidence is introduced,

thus addressing epistemic uncertainty and propagating that evidence.

38

Chapter 3

Bayesian Approach to Risk Analysis

3.1 Current Research using Bayesian Methods

While FTA is an e↵ective tool for predicting system unreliability, it requires knowledge of

component failure rates. Such information is not always available to small UAV designers and

may be prohibitively expensive to acquire. For example, component failures may occur so

rarely as to defy easy experimental quantification. The current disadvantages of the FTA are

being explored and supplemented with a BBN approach to better capture inter-connectivity

of failures and incorrect probabilities.

There has been a persistent need to develop advanced risk analysis tools to move beyond

simply identifying risk factors. As systems grow there is a need for analyzing and interpreting

the complex interactions of various system risk factors. Luxhoj reports on the development

of an Aviation System Risk Model (ASRM) to use the underlying probabilistic methodology

of BBNs and their influence diagrams to graphically portray these complex interactions [22].

39

A strength of the BBN approach is to simplify large systems and illustrate them as OOBNs.

This approach allows a small subsystem to easily be integrated as an object into the larger

system. This object can facilitate a test piece or mission specific scenario without having to

recreate a full system, allowing for a building block style system. An example of this is an

aircraft system where only a specific part of the system is desired to be investigated, such as

a the communications system, without looking at much of the larger network. It could then

be desirable to investigate the communications system for a lost link scenario. Evidence of

the link being lost can be propagated and allow likely causes to be identified, as well as how

those causes will later a↵ect the larger system [23].

The conventional FTA approach only allows for known quantities to be accounted for and

does not allow for inter-connectivity of failures as demonstrated by Murtha [27]. Reason

[33] has been conducting research into the possibility that failure events that were originally

believed to not cause a failure may combine further along with other minor failure events to

cause a top level failure. The eventual combination of failure events was coined the ’Swiss

Cheese Model’ as failure events that should not propagate forward past one event would

eventually find a small probability, or hole in each event, and move to the next event in

the chain as if passing through holes in multiple layers of swiss cheese. An application of

this type of model would be di�cult to be modeled with the conventional FTA approach.

These limitations can be overcome using Bayesian Belief Networks. Others have recognized

the limitations of FTA and have suggested the use of BBNs to address these shortcomings.

Janota [16] shows how the potential of FTA can be capitalized on and expanded using BBNs

without the typical limitations of FTA. A traditional FTA can be taken and transformed

into a BBN enabling a risk analyst to preserve and re-use earlier work based on FTA.

40

3.2 Fault Tree as a Special Case of Bayesian Methods

Fault Tree Analysis is mainly and commonly used in the fields of safety and reliability

engineering to understand how systems can fail at a functional level [16]. It can be applied

to identify the best ways to reduce risk or to determine what event rates are for an accident. It

is commonly used to analyze severe failure conditions where information about catastrophic

failure is desired.

As discussed in Section 2.5.1, a BBN is a type of directed acyclic graph showing influence

between each variable or event. A FT is represented in much the same way, with a few

nuances that place limitations on its implementation. For instance, a FT can only have

converging or serial nodes to represent it. This limitation on the type of nodal interaction

does not exist for a BBN. A BBN expands the types of interactions of a FT with the addition

of diverging and hybrid type nodes. These interactions are seen in Figure 3.2.1, with the

BBN type of nodal interactions represented by all four cases. A BBN can be represented

by both converging and serial nodes and the nodal interactions can be represented using

boolean integers in their CPTs. This method would then emulate an FT.

3.3 Fault Tree Comparison to Bayesian Network

To show a BBN representation of a classic FT, a FT is first introduced. Research was

performed previously at Virginia Tech by Murtha [27], whose work introduced a FT focused

on a small UAS platform, and this example is expanded upon for the BBN research. Murtha

sought to utilize FTA in order to identify the most likely cause of catastrophic failure and

mitigate against it in order to drive down cost and improve operational reliability of the

41

Figure 3.2.1: Bayesian Expansion of Fault Tree .

platform. Figure 3.3.1 represents a simple aircraft system with the top level event being a

system failure. This system is built from only two type of gates, AND and OR gates. The

OR gate consists of: a Main Battery, an Autopilot; while the AND gate is composed of two

Servos. The probabilities for each of these events were drawn from a subject matter expert

and are shown in Table 3.3.1 [27].

Component Probability of Failure

Main Battery 0.05Autopilot 0.001Servo 0.1

Table 3.3.1: Component Probabilities (Source: [27]) .

The total probability of failure may be calculated using Equations (2.1.4) and (2.1.5), and the

rare event approximation is not used in this case. The probabilities of failure are represented

by: PSF for Probability of System Failure, PMB for the Main Battery, PA the Autopilot, and

42

Figure 3.3.1: Fault Tree Example (Source: [27]) .

PS for the Servos.

PSF = 1� (1� PMP ) ⇤ (1� PA) ⇤ (1� PS1 ⇤ PS2) = 0.06044 (3.3.1)

Equation (3.3.1) provides the probability of failure for the given system as 6.04% from the

FT modeling approach.

The same system shown in Figure 3.3.1 can be represented using a BBN with a few small

di↵erences. A collector node is added to the two servos to form a Servo System, and this

acts as the AND gate to simplify the CPTs to two smaller ones, similar to the divorcing

concept allowing one 3-dimensional CPT to be represented as 2 2-dimensional CPTs. This

slightly modified system can be seen in Figure 3.3.2.

The most important di↵erence in the approaches is how the nodal interactions are handled.

While a FT requires boolean logic, a BBN does not; however, a BBN can accommodate

43

Figure 3.3.2: Bayesian Net Example .

Boolean Logic. To achieve the same form of logic, conditional probabilities in the Bayes

formula in Equation (2.5.3) are given as 1 or 0. To illustrate this, a simple two node system

is introduced with node B influencing node A as given in Figure 3.3.3.

Figure 3.3.3: Two node system .

In this example the initiating event is node B, and the conditional probabilities are treated as

boolean. If B is known to occur, then A also has to occur, and is represented as P (A|B) = 1.

If B does not occur, then A can not have occurred. Figure 3.3.4 shows the e↵ects of these

interactions, as well as the lack of sensitivity due to possible variations. Also note that if B

occurs there is no possibility that A does not occur, P (B|A⇤) = 0, where the complement

A* indicates that A cannot occur. These conditional probabilities are used to build the

CPTs that handle more complex nodal interactions. The posterior probabilities indicate the

symbiotic relationship as well, that if A is known to occur, B is also known to occur.

44

Figure 3.3.4: Probability Tree of Example .

These nodal interactions can then be expanded to handle the larger system as seen in Figure

3.3.2 and to build the corresponding CPTs. The conditional probabilities are shown as the

boolean operators columns for each scenario of failure event.

Figure 3.3.5: CPTs of BBN example .

The probabilities can then be propagated to give the overall probability of system failure as

0.06044 or 6.044%. This answer is identical to the answer from the FT approach in Equation

(3.3.1), showing the equivalence of the approach. While the FT approach uses the gates to

collect the probabilities, the CPTs perform a very similar role in the BBN method. The

gates are conceptualized in the CPT. For example, the servo system node can be observed

as an AND node in a FT, and a Servo System Failure only occurs if both Servo1 Fails and

45

Servo2 Fails. Otherwise, it is seen that the system is in a No Servo System Failure condition.

Converting a FT to a BBN is a useful application of Bayesian modeling. However, doing

so only allows the model to be as accurate as it is modeled under the constraints of the

FT. Removing those constraints may enable the model to better represent the system. For

instance, at first glance the system modeled in Figure 3.3.1 seems to model a system well.

Upon close examination, deficiencies can be seen. For example the failure of the main battery

does not directly cause the system to fail. It is in fact a primary failure event of both the

Servo System and the Autopilot system failures. The system can be remodeled as seen in

Figure 3.3.6 to better capture the system interactions, which would not be allowed in a FT

approach without mirror blocks.

Figure 3.3.6: System Remodeled .

This type of systematic modeling error can be seen as a form of epistemic uncertainty in

the model. Epistemic uncertainty is systematic uncertainty do to a discrepancy between

known theory and practical application [3]. In this case by modeling the system in a way

that best captures the interactions and dependencies of events, the epistemic, or modeling,

uncertainty can be reduced.

46

3.4 Identifying Mitigations

Unlike FTs, BBNs do not require the conditional probabilities to be boolean events which

were first introduced in Chapter 2. This relaxes the boolean condition imposed in Section 3.3.

If the two node system in Figure 3.3.3 is re-imagined such that the conditional independence

is not 1, it can be said that if B occurs, then A is only likely to occur with some non-

zero probability. It is instructive to build a probability tree to show this, as seen in Figure

3.4.1. The conditional probabilities are relaxed by 5% to show a near certainty correlation

between the failure events of A given B, P (A|B) = 0.95. This allows for the aforementioned

leak probabilities and possibility that failure of A could be caused by an unknown outside

influence or that B could occur and not trigger event A.

Figure 3.4.1: Non-Boolean Conditional Probabilities .

Section 2.5.4 introduced the sensitivity analysis method that can be performed with non-

boolean conditional probabilities. Returning to the example in Figure 3.3.2, the CPTs can

be varied to provide information about the interdependence of the nodes. First, the CPTs

must be relaxed from boolean operators, for these values are chosen corresponding to subject

47

matter expert input. To further exemplify the utility of BBNs, a third condition is added to

the battery node to indicate a degraded battery condition. The overall probability of failure

for the system with the modifications is 44.76%. The sensitivity function from Equation

(2.5.4) is used to determine the sensitivity to the varied conditional probability values. For

this example the values were varied by +/- 10% and +/- 20%. The results of this variation

are shown in Table 3.4.1. The results are relatively close between the Battery Failure and

Autopilot Failure, however Battery Failure is more sensitive to variation.

Hypothesis Variable Parameter Sensitivity 0% 10% -10% 20% -20%

System Fail Servo 1 0.04 0.44 0.44 0.436 0.448 0.432System Fail Servo 2 0.02 0.44 0.442 0.438 0.444 0.436System Fail Battery Failure 0.41 0.43 0.471 0.459 0.512 0.448System Fail Battery Degraded 0.2 0.42 0.441 0.4 0.46 0.38System Fail Autopilot Failure 0.11 0.45 0.461 0.439 0.472 0.428

Table 3.4.1: Sensitivity to Parameter Variation .

This sensitivity can be shown in an alternative way to further illustrate this important

concept. Evidence can be introduced of certain failure events and the overall system failure

percent increase relative to the baseline can be examined. This is known as a Likelihood

Multiplier and is represented as Equation (3.4.1) [24].

LikelihoodMultiplier (LM) =P (withCausal Factor evidence)

P (withoutCausal Factor evidence)(3.4.1)

Multiple failure events can be examined and their results tabulated. The LMs are shown in

Table 3.4.2. These results clearly show which failure event has the highest e↵ect on overall

system failure. Clearly a system failure is much more sensitive to battery failure and even a

degraded battery condition.

These values from Table 3.4.2 can be shown graphically in Figure 3.4.2 to further illustrate

48

Failure 1 Failure 2 Risk(%) Likelihood % Increase Rank

None None 44.59 1 0Autopilot Failure None 55.58 1.246 19.773Battery Failure None 83.58 1.874 46.649 3Servo1 Failure None 47.93 1.075 6.968Servo1 Failure Servo 2 Failure 48.98 1.098 8.968Battery Failure Autopilot Failure 94.36 2.116 52.745 1Battery Failure Servo 2 Failure 87.14 1.954 48.829 2Servo 2 Failure Battery Failure 59.19 1.327 24.666

Battery Degraded None 61.27 1.374 27.224Battery Degraded Servo 1 Failure 64.76 1.452 31.146

Table 3.4.2: Likelihood Multipliers

this example. Clearly a battery failure condition is an event that needs to be mitigated.

Figure 3.4.2: Graph of Likelihood Multipliers (LMs) .

49

3.5 Criticality Analysis

The methods introduced in Section 3.4 to identify areas that may need mitigation in a

system can also be seen as a form of quantitative Criticality Analysis. Common criticality

techniques require identifying both a severity of the failure event as well as its likelihood of

occurrence. If the most susceptible component of a system can be identified, the criticality

numbers could be reduced. To reduce the likelihood of occurrence of an undesirable event,

a more reliable part could replace the current one or a redundant component could be used.

For the system in Figure 3.3.2, this will not have an e↵ect on the severity of the occurrence,

just the likelihood. The system is still dependent on the power provided by the battery to

operate fully. A system with an additional battery can be seen in Figure 3.5.1 with a battery

system collector node.

Figure 3.5.1: Bayesian Example with Mitigation

The reduction in system risk from 5% to 39.72%. The relative risk between the mitigated

and unmitigated conditions can be computed as a risk ratio in Formula 3.5.1, as shown by

Luxhoj [24].

LogicRisk Ratio =P (mitigated)

P (unmitigated)=

39.72

44.76= 0.8873 (3.5.1)

This Logic Risk Ratio means that the mitigation of an additional battery reduces the likeli-

50

hood of its occurrence by 22%.

3.6 FMECA representation

A FMECA is commonly used to represent failures and their higher level e↵ects, as well as the

criticality analysis of such failures. A more in-depth explanation of this method can be found

in Section 2.4. The same process used to develop a FMECA is also useful when developing

a BBN. It is necessary to identify the failure events and what failure modes will occur as a

result, as well as how those failures a↵ect the higher level systems. With this commonality

between the two approaches and the need of the FMECA for a criticality analysis it is

advantageous to use both in parallel. An example FMECA can be seen in Figure 3.6.2, built

from the working example in this chapter to show the e↵ects of mitigations. To ascertain

the criticality number a risk matrix was adapted and slightly modified from the NAVAIR

risk matrix [9] in Figure 3.6.1.

Figure 3.6.1: Modified NAVAIR Risk Matrix .

On the Risk Matrix, risks denoted in the red region are obviously events that are highly

undesirable and need to be mitigated if undesirable events are to be avoided. This is exem-

plified in the FMECA, where the unmitigated failures of a single servo, or single battery are

ranked below a value of 5. These failures are then mitigated with a redundant component

to bring the probability of occurrence down, increasing the risk number to a more desirable

51

level. Using the BBN to remodel and recalculate the probability of occurrence can reduce

the complexity of having to recalculate these probabilities for each component.

Figure 3.6.2: Example FMECA .

3.7 Summary

BBNs can address some of the deficiencies su↵ered by FTA. A BBN is shown to represent a

FT, as a special case, with a numerically equivalent system risk. By applying a sensitivity

analysis to the BBN, areas that may require mitigation can be identified and then a miti-

gation applied. This fits directly into the operation of a FMECA and can supplement its

advantages as well. The approach discussed above could further be expanded by attaching

52

a utility and cost to the mitigation. A cost benefit analysis can then be performed to show

how the system can reduce risk for the amount spent.

53

Chapter 4

ESPAARO Case Example

Methods of Risk Analysis theoretically work well for the applications that they are presented

for. However the end goal of a method is to be applied to actual systems and to reduce the

risk of that system in practice. The method outlined in Chapter 3 was therefore applied

to an internally funded UAS design at Virginia Tech, called the Electric Small Platform for

Autonomous Aerial Research and Observation (ESPAARO). This aircraft is an iteration of

a previous platform and is built as a research test bed to accommodate a wide range of

modifications. For instance, the fuselage support hatches can be modified or replaced as well

as a wing that can be detached for a modified wing to replace it. Above all this aircraft

was designed with a low price point and gentle flight characteristics to facilitate flight at the

Kentland Experimental Aerial Systems (KEAS) Lab, show in Figure 4.0.1.

54

Figure 4.0.1: KEAS Lab Runway

4.1 Introduction to the ESPAARO

The ESPAARO is designed with modifiability as a key design objective, and for this rea-

son each part of the air frame can be detached without greatly a↵ecting the other parts.

Everything is attached to a carbon fiber strut assembly that acts as the keystone to hold

the airframe together. This allows the tails to be replaced, or the wing, or a fuselage to

be switched out. A dimensioned drawing of this aircraft is displayed in Figure 4.1.1. This

aircraft has a wingspan of 12 feet and a maximum takeo↵ weight of 45lbs, which allows large

experimental payloads to be flown up to 10 lbs. An example of the value of this modularity is

demonstrated with a morphing actuated wing on an unmodified ESPAARO [14]. A drawing

showing the strut assembly is shown in Figure 4.1.2.

The aircraft was originally prototyped out of a fiberglass wrapped foam fuselage. As the

design was refined, a mold was manufactured by Nextgen Aeronautics to facilitate a hollow

composite sandwich fuselage that greatly increased the usable payload volume and decreased

the fuselage weight. The design of the airframes allows the fuselage to be removed from

the wings and tails to allow a di↵erent fuselage to replace it. The design of the appendages

55

Figure 4.1.1: ESPAARO Drawing

Figure 4.1.2: ESPAARO Strut Assembly

56

attaching to this are desired to be simple and only consist of one control surface and actuator

per wing section or tail. To facilitate low speed handling performance, the control surfaces

were over-sized and the controller had exponential gains added to prevent pilot induced

oscillation.

Another design driver of the project is to allow custom control algorithms to be developed

and tested on the aircraft. For this reason the 3D Robotics Pixhawk autopilot was chosen,

allowing o↵ the shelf usability but also an open source Real Time Operating System for

custom modification. This system allows for numerous control modes, primarily a manual

pass through of control, an augmented stabilized mode, or a fully autonomous flight mode.

In order to generate custom control algorithms it is often desirable to identify a model for

the system that is being controlled. The autopilot has numerous sensors on board in order

to facilitate this but also allows the integration of additional sensors into the system. For

flight testing an air data boom was attached to log angle of attack and sideslip information.

The Nonlinear Systems Lab maintains 3 ESPAAROs, the finished product can be seen on

the runway at KEAS in Figure 4.1.3. The Mid Atlantic Aviation Partnership also maintains

two.

4.2 Model Construction and Probability Elicitation

For a risk model to be constructed of the ESPAARO, a panel of three Subject Matter

Experts (SMEs) was polled to identify potential failures and their e↵ects. Each SME was

very familiar with the aircraft, its construction, and its operation. For the purpose of this

risk model, it was agreed upon to identity failures that would result in a mission failure, the

57

Figure 4.1.3: ESPAARO on the runway

inability of the aircraft to perform its given operational duties. This could range anywhere

from a lost data link to a loose wire. These failures will not necessarily result in the loss of

the aircraft; however, they could. The emphasis is on operational reliability of the aircraft.

It was also agreed upon to focus on the probability of failure for these events for any given

flight, and not per flight hour.

Potential failure events were identified with the SMEs that would prevent the operation

of the ESPAARO platform. Forty three base failures were identified with multiple e↵ects.

These failure events have probabilities that were elicited from the SMEs and can be seen

in Figure 4.2. To account for the di↵erence in the SME’s probabilities the values were

averaged. Another method is to use evidence theory as discussed by Murtha[27]. If values

ranged beyond 10% a discussion was held to better analyze the failure event. On occasion

this would result in a rework of the network structure, with a finer detail of failure events,

such as the case with the control surfaces.

58

Probability of Failure (per flight)

Parameter SME 1 SME 2 SME 3 Average

Turbulence/Wind Induced 0.0001 0.0001 0.0001 0.0001

Wing Strut/Fuse Attachment 0.001 0.001 0.001 0.001

Wing Spar 0.0001 0.0001 0.0001 0.0001

Sheet Puncture 0.1 0.01 0.01 0.04

Attachment Failure 0.001 0.001 0.001 0.001

Control Flutter 0.001 0.001 0.001 0.001

Control Linkage 0.15 0.1 0.1 0.12

Control Surface Binding 0.001 0.01 0.01 0.007

Servo 0.01 0.001 0.01 0.007

Servo Wire 0.1 0.1 0.1 0.1

Radio Blanketing 0.2 0.01 0.1 0.10

Radio Interference 0.1 0.2 0.2 0.17

Radio Distance Induced Lack of Control 0.01 0.01 0.01 0.01

Improper Wiring 0.1 0.05 0.05 0.07

Autopilot Failure 0.2 0.25 0.2 0.22

CG Shift 0.01 0.01 0.01 0.01

Battery Failure 0.01 0.01 0.01 0.01

Pilot Error 0.25 0.25 0.25 0.25

Shaft Slippage 0.1 0.1 0.1 0.1

Wire Disconnect 0.001 0.001 0.001 0.001

Main Battery Failure 0.02 0.01 0.01 0.013

Voltage Ripple to Motor 0.8 0.5 0.5 0.6

59

ESC Unarm 0.025 0.05 0.001 0.025

FOD through Propeller 0.05 0.05 0.05 0.05

Crack Propagation 0.01 0.001 0.001 0.004

Prop Hub Bolt 0.05 0.1 0.05 0.067

Fuse Attachment 0.0002 0.0001 0.0001 0.00013

Propulsion Attachment 0.005 0.001 0.01 0.0053

Landing Gear 0.1 0.1 0.1 0.1

Hatch Attachment 0.01 0.005 0.005 0.0067

Fatigue 0.001 0.001 0.001 0.001

Acute Damage 0.1 0.01 0.05 0.053

Programming Error 0.05 0.05 0.05 0.05

Improper Set Up 0.2 0.2 0.2 0.2

Improper Gains 0.1 0.05 0.1 0.083

Pitot Static Error 0.05 0.05 0.05 0.05

AHRS Calibration Error 0.05 0.05 0.05 0.05

GPS Error 0.01 0.01 0.01 0.01

Radio Failure 0.4 0.3 0.4 0.37

Foreign Control Input 0.01 0.01 0.01 0.01

Battery Failure 0.01 0.01 0.01 0.01

Computer Lock Up 0.001 0.0001 0.001 0.0007

Figure 4.2 Failure Probability Data

The failure data for each SME was relatively close in agreement and was then averaged

to obtain an ’aggregate’ failure probability. To build the BBN model of this datum each

60

event’s e↵ects had to also be identified. The process to build a BBN is similar to a FMECA

process as each root e↵ect will a↵ect a higher level failure mode. While the FMECA builds

a table, the BBN will build a directed acyclic graph. In this process each failure is seen

to have e↵ects that will propagate in some way to a higher level e↵ect of a mission failure.

These links are built in the BBN model using the CPTs discussed in Section 2.5. For each

CPT, the failure events that build it were ranked from the highest severity to lowest by

the SMEs. The highest and lowest conditional probabilities of failure were then bracketed

by the SMEs. This bracketing was done by determining the probability of all failures in a

given CPT occurring as well as no failures. The remaining probabilities in the CPTs were

populated using a tool developed by a team from the Naval Postgraduate School [10]. The

CPTs for this model can be found in the Appendix.

4.3 Conversion to an OOBN

With each failure parameter identified, a model to represent them must be constructed. This

modeling process was also iterated with the SMEs and proved to be a useful conversation

to discuss the higher level e↵ects of each failure event. As the model was constructed the

graphical representation provided a useful method to deductively identify further failure

events. Figure 4.3.1 shows a model of a controls failure sub-net. The subnet indicates that if

certain failure events occur a control failure will be more likely to occur. For instance, if the

radio link is severed between the aircraft and the pilot, it will lead to a manual control loss

and then to a controls failure. However, other events may occur that could cause a manual

control loss or multiple failures could occur in conjunction to increase the likelihood of a

control loss. This is worth reiterating as a di↵erence from a FT method, where one event

occurs to cause a failure or multiple events combine to cause a failure. One failure event

61

may cause a system failure but it may also not result in a system failure. If a failure event

event occurs the probability of the system failing increases but is not a certainty. Unrelated

events that were not previously thought to combine together to cause a failure may end up

doing so. For a failure to occur like this is an application of the Reason model [33].

Figure 4.3.1: Control Failure Model

As the model was progressively built it became obvious that the complicated interconnections

and size of the model made it di�cult to display or analyze. The BBN representation of

the ESPAARO is quite large and the interdependences between systems make it di�cult

to follow. The model representation of the ESPAARO can be seen in Figure 5.1.1. This

complexity led to the conversion of the model to an OOBN which proved exceedingly useful

in not only representation, but also for investigating the subsystems individually. To simplify

the representation of the model it is converted to an OOBN. The control system can be seen

in Figure 4.3.1 and the top level aircraft in Figure 4.3.2. By creating sub-nets it is easily

possible to investigate each component separately from the full Bayesian Network, which also

allows the simplified view of the full model without loss of granularity into possible causes

of failure.

62

Figure 4.3.2: ESPAARO Aircraft OOBN

4.4 Identifying Mitigations

Sensitivity was first introduced in Section 2.5.4 and is used to identify the most sensitive

failure modes from an abbreviated list of in Table 4.4.1. Theses values were varied by +/-

10% and +/- 20% using the Sensitivity Equation to produce the respective sensitivity values.

These values identify which events will have the greatest e↵ect on the overall system risk,

and correspondingly merit the most mitigation. From this analysis it can be see that a CG

shift, wire disconnect, or main battery failure are the most sensitive to variation.

To further illustrate the e↵ects of sensitivity on the system reliability we can utilize the

likelihood multiplier that was introduced in Section 3.4.1. This compares the base system

risk to that when a particular failure is observed to have occurred, the ratio between the

two being the likelihood multiplier, which is shown in Table 4.4.2. It can also be observed

63

Failure Events 0% 10% -10% 20% -20%

Main Battery Failure 0.29 0.308 0.272 0.326 0.254Battery failure 0.29 0.307 0.273 0.324 0.256Interference 0.29 0.29 0.29 0.29 0.29

Wire Disconnect 0.29 0.311 0.269 0.332 0.248AHRS Error 0.29 0.301 0.279 0.312 0.268ESC Error 0.29 0.306 0.274 0.322 0.258

Attachment Failure 0.29 0.313 0.267 0.336 0.244CG shift 0.29 0.311 0.269 0.332 0.248

Servo Failure 0.29 0.303 0.277 0.316 0.264

Table 4.4.1: Abbreviated Sensitivity Analysis

that multiple failures can occur and is also shown in Table 4.4.2. The failures identified in

Table 4.4.1 are prevalent, however a servo failure is the most likely to occur. A servo failure

increases the system risk by 47.7% to a total of 59%. Two methods of investigation allow

insight into how di↵erent failure a↵ect the system risk. A Sensitivity Analysis focuses on

how damaging to the system any one particular failure can be while disregarding how likely

it is to occur. By applying a likelihood multiplier, it is possible to investigate further, by

analyzing how likely failures are to occur.

Figure 4.4.1: Likelihood Multiplier Graph

64

Failure 1 Failure 2 SystemRisk

LikelihoodMultiplier

PercentIncrease

None None 29.43 1 0Main Battery Failure None 46.91 1.52 34.25

Battery failure None 46.07 1.49 33.05Interference None 50.59 1.64 39.03

Wire Disconnect None 49.99 1.62 38.30AHRS error None 39.57 1.28 22.06ESC error None 45.4 1.47 32.07

Attachment Failure None 54.08 1.75 42.97CG shift None 50.54 1.63 38.97

Servo Failure None 59.02 1.91 47.74Attachment Failure Interference 61.84 2.00 50.12

CG shift Attachment Failure 68.41 2.21 54.91Interference Servo Failure 66.05 2.14 53.30

Wire Disconnect ESC error 51.5 1.66 40.11AHRS error Attachment Failure 60.96 1.97 49.40ESC error AHRS error 52.1 1.69 40.80

Attachment Failure Wire Disconnect 66.74 2.16 53.79CG shift Servo Failure 71.99 2.33 57.16

Servo Failure AHRS error 65.25 2.11 52.73

Table 4.4.2: Likelihood Multiplier

65

4.5 Application and E↵ects of Mitigations

A sensitivity analysis allows events of a system to be identified that have the most potential

to a↵ect system risk. With these events identified, the most critical of these can be mitigated

against. For the ESPAARO failure events can be mitigated against by adding a redundant

system to initiate in the event of a failure. These redundant systems are added to the overall

ESPAARO BBN as discussed in Section 3.3. Each of these failure events were chosen to be

mitigated against with input from the SME panel for realistic implementation. For instance

it is conceivable to add a second battery to a system.

The results of adding 4 redundant systems are shown in Table 4.5.1. Adding a second radio

control link has the potential to reduce the overall system risk from 29.43 to 22.49 percent.

Applying all four mitigators has the potential of reducing the overall system risk to 19.8

percent. Application of the Logic Risk Ratio shows an overall system risk reduction of 67%

by utilizing these risk mitigations.

Redundant System System Risk

None 29.43Second Radio 22.49

19.8Second Control Battery 29.16

Secondary Servos and Linkages 26.48Second Main Battery 25.58

Table 4.5.1: Risk Mitigators

66

Chapter 5

Conclusions and Future Work

Fault Tree Analysis is widely used to calculate a system’s probability of failure, using boolean

logic in the form of AND and OR gates. These components make up the individual compo-

nents in a system to comprise a system “tree” that allows analysis to be performed. These

systems have been used to investigate risk for more than 50 years and are well suited for use

in critical components. However, they cannot easily capture the complex interactions among

components. Bayesian Networks are beginning to be used to investigate many complex

interactions and failures between components.

Bayesian Belief Networks were shown to allow the formulation of boolean logic similar to

FTA. This comparison allows BBN to be demonstrated as a peer to FTA for what is currently

and commonly utilized in Risk Analysis. Numerical examples demonstrated both methods

generating numerically similar answers when modeled using the same methods. Therefore,

FTA can be shown as a special case of BBNs.

Using BBNs allow for a system to be model as closely to how it functions in actuality

67

without adhering to the modeling techniques imposed with FTA. BBNs allow di↵erent ways

of modeling a system or introducing non-boolean operators. The example used to compare

FTA and BBN had the boolean operators relaxed slightly to allow for sensitivity analysis to

be performed. Non boolean operators allows for the investigation of key components that

would have the highest e↵ect on system level failure. These components and their failure

modes could then be mitigated against.

A practical example of the methods introduced was demonstrated on Virginia Tech’s ES-

PAARO platform. This allowed for a realistic system model to be built using BBNs and

then to be graphically simplified using OOBN’s. As this aircraft is currently flown as an

experimental test aircraft, its reliability is always important. The analysis that was done

as a topic of this thesis was always motivated to improve the reliability of that aircraft.

Proposed mitigations were therefore implemented.

The motivation for this thesis is focused on improving the utilization of currently available

risk analysis techniques and applying them to small UAS. A fundamental problem however

is that failure data is not readily available and information has to be inferred from subject

matter experts or from experimental data. Future work could focus on expanding the testing

on certain base level components being used in this application to provide better failure data.

This base level information would then propagate into a more accurate system model.As the

airframe is utilized more, more information will become known.

As UAS become more common and begin to operate in the NAS more information will be

demanded. Characterizing reliability data of certain common components such as motors,

servos, or landing gear would provide useful feedback information for validating reliability

studies in the future. Such components are widely used and very poorly understood in their

current non-military applications.

68

Attached are a collection of Bayesion Networks that comprise the entire ESPAARO system

as presented in the above works. The Bayesion Network is first presented as a whole BBN

and then broken out to form the subnets comprising the OOBN.

69

5.1 BBN for ESPAARO

Figure 5.1.1: ESPAARO BBN70

5.2 OOBN for ESPAARO

Figure 5.2.1: Autopilot Subnet

Figure 5.2.2: Control Surface Subnet

71

Figure 5.2.3: Propulsion System Subnet

Figure 5.2.4: Control System Subnet

72

Figure 5.2.5: Strut Subnet

Figure 5.2.6: Tail Subnet

73

Figure 5.2.7: Wing Subnet

Figure 5.2.8: Aircraft Subnet

74

Figure 5.2.9: Camera Subnet

5.3 Sensitivity Results

75

Parameter Sensitivity ↵ � � �

Turbulence 0.3 0.3 0.29 0 1Wing Spar 0.22 0.22 0.29 0 1

Sheet Puncture 0.18 0.18 0.29 0 1Control Flutter 0.07 0.07 0.29 0 1

Control Surface Binding 0.07 0.07 0.29 0 1Servo Failure 0.13 0.13 0.29 0 1Servo wire 0.12 0.12 0.29 0 1

Control linkage 0.09 0.09 0.29 0 1Cg sShift 0.21 0.21 0.29 0 1

Wing Failure(complete) 3.46E-08 3.46E-08 0.29 0 1Structural(complete) 3.95E-11 3.95E-11 0.29 0 1Attachment Failure 0.29 0.23 0.29 0 1

Landing Gear 0.15 0.15 0.28 0 1Fatigue 0.18 0.18 0.29 0 1

Acute Damage 0.12 0.12 0.29 0 1Propulsion Attachment 0.18 0.18 0.29 0 1

Hatch Attachment 0.17 0.17 0.29 0 1Pilot Error 0.22 0.22 0.27 0 1

Battery Failure 0.17 0.17 0.29 0 1Improper Wiring 0.14 0.14 0.29 0 1

Interference 0 0 0.29 0 1Voltage Ripple 0.19 0.19 0.26 0 1

Crack Propagation 0.16 0.16 0.29 0 1Main Battery Failure 0.18 0.18 0.29 0 1Wire Disconnect 0.21 0.21 0.29 0 1ESC Unarm 0.16 0.16 0.29 0 1

FOD through Prop 0.14 0.14 0.29 0 1Shaft Slippage 0.22 0.22 0.27 0 1

AHRS Calibration 0.11 0.11 0.29 0 1Pitot Error 0.08 0.08 0.29 0 1

Foreign Control 0.07 0.07 0.29 0 1Computer Lock up 0.18 0.18 0.29 0 1

GPS error 0.09 0.09 0.29 0 1Improper Gains 0.09 0.09 0.29 0 1Improper Setup 0.1 0.1 0.29 0 1

Programming Error 0.12 0.12 0.29 0 1Spar Tail Attach 0.25 0.25 0.29 0 1

Table 5.3.1: Sensitivity Analysis Results for ESPAARO

76

77

Bibliography

[1] Unmanned systems integrated roadmap fy2013-2018 http://www.cs.ubc.ca/ mur-

phyk/bayes/bnintro.html.

[2] Abductive reasoning. http://en.wikipedia.org/wiki/abductivereasoning, April 2015.

[3] Uncertainty quantification. Internet, 2015.

[4] Summary of small unmanned aircraft rule (part 107). Technical report, Federal Aviation

Administration, June 2016.

[5] RTO-NATO 2000, editor. Commercial O↵-the-Shelf Products in Defence Applications

”The Ruthless Pursuit of COTS”. RESEARCH AND TECHNOLOGY ORGANIZA-

TION, April 2000.

[6] Federal Aviation Administration. Safety Management System Manual Version 4.0. Air

Tra�c Organization 2014, 2014.

[7] Denise Marie Andres. Development of a post-consequence model (pcom) for aircraft

accident severity assessment. Master’s thesis, Rutgers State University of New Jersey,

2005.

[8] Bilal M. Ayyub. Risk Anallysis in Engineering and Economics. Chapman & Hall, 2003.

78

[9] William Balderson. NAVAIR RISK ASSESSMENT HANDBOOK. Naval Air Systems

Command, 2002.

[10] Kong Luxhoj McKnight Miller Stevens Tonello Rhoades Brockway, Johnston. Exper-

imental unmanned aerial systems interim flight clearances challenge. Master’s thesis,

Naval Post-Graduate School, 2014.

[11] Avi Pfe↵er Daphne Koller. Object-oriented bayesian networks. In Proceedings of the

Thirteenth Annual Conference on Uncertainty in Artificial Intelligence, 1997.

[12] Chad Moses David King, Allen Bertapelle. Uav failure rate criteria for equivalent level

of safety. In International Helicopter Safety Symposium, 2005.

[13] Richard Denning. Applied R&M Manual for Defence Systems Part C- Techniques.

Ministry of Defence, May 2012.

[14] K. Pyne A. Bialy M. Burns G. Mohan N. Beaty C. MacNeal C. Weit C. Kevorkian C.

A. Woolsey E. B. Doepke, M. Heim and M. Philen. Design and demonstration of a

flexible matrix composite actuated flap in a uav. In ASME 2015 Conference on Smart

Materials, Adaptive Structures and Intelligent Systems (SMASIS), 2015.

[15] HUGIN Expert A/S. HUGIN API Reference Manual Versian 8.1, October 2014.

[16] Ales Janota. Overcoming limitations of fault tree analysis using bayesian belief networks.

2014.

[17] Finn V Jensen. Introduction to Bayesian Networks. 1995.

[18] Kececioglu. Reliability Engineering Handbook, Volume 2. Prentice Hall Inc, 1991.

79

[19] S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graph-

ical structures and their aapplication to expert systems. Journal of the Royal Statistical

Society, 1988.

[20] Silja Renooig Linda van der Gaag. Analysing sensitivity data from probabilistic net-

works. 2001.

[21] Dr. James Luxhoj. AOE 5984: Special Topics course on Risk Analysis for Aerospace

Systems, 2015.

[22] James T. Luxhoj. Probabilistic causal analysis for system safety risk assessments in

commercial air transports. Depertment of Industrial and Systems Engineering, Rutgers

University.

[23] James T. Luxhoj. Predictive analytics for modeling uas safety risk. SAE International,

2013.

[24] James T. Luxhoj. Special topics in risk analysis: Overview of the lauritzenspiegelhalter

(l-s) algorithm, January 2015.

[25] Kjaerul↵ U. Jenson F. Madsen A., Lang M. The hugin tool for learning bayesian

networks.

[26] Kevin Murphy. A brief introduction to graphical models and bayesian networks.

http://www.cs.ubc.ca/ murphyk/bayes/bnintro.html, 1998.

[27] Justin F. Murtha. An Evidence Theoretic Appraoch to Design of Reliable Low-Cost

UAVs. PhD thesis, Virginia Polytechnic Institute and State University, 2009.

[28] American Institue of Chemical Engineers, editor. Guidelines for Chemical Process Quan-

titative Risk Analysis. Center for Chemical Process Safety, 2000.

80

[29] Department of Defense. Mil-STD-1629A-ROCEDURES FOR PERFORMING A FAIL-

URE MODE, EFFECTS, AND CRITICALITY ANALYSIS, 1980.

[30] Department of Defense. Standard Practice for System Safety, 2012.

[31] NASA O�ce of Safety and Mission Assurance. Fault Tree Handbook with Aerospace

Applications. NASA, 2002.

[32] Judea Pearl. Probabilistic Reasoning In Intelligent Systems: Networks of Plausible

Inference. Morgan Kaufmann Publishers Inc, 1998.

[33] James Reason. Human error: models and management. Education and Debate, 2000.

[34] Systems and Reliabilty Research O�ce. Fault Tree Handbook NUREG-0492. U.S Nu-

clear Regulatory Commision, 1981.

[35] Kevin Williams. A summary of unmanned aircraft accident/incident data: Human fac-

tors implications. Civil Aerospace medical Institute, Federal Aviation Administration,

December 2004.

81

Documents

UAS Risk Analysis using Bayesian Belief Networks: An ... · UAS Risk Analysis using Bayesian Belief Networks: An Application to the Virginia Tech ESPAARO Christopher G. Kevorkian