Fault tree analysis semiar report

FAULT TREE ANALYSIS

ABSTRACT

Reliability analysis plays crucial role in the design process. In order to increase reliability of a

system, analysis of failure data is essential. Fault Tree Analysis (FTA) is a method that directly

focuses on the modes of failures. FTA is a graphical representation of the major faults or critical

failures associated with a product and the causes for the faults and potential countermeasures.

This paper presents the FTA of Lathe Machine. Qualitative and quantitative analysis helps to

identify critical design parameters, maintenance suggestions. It also includes how reliability

analysis fruitful for Life cycle cost management.

Fault Tree Analysis (FTA) is a failure analyzing method utilizing Boolean logic and low-level

events to analyze undesired states of a system. It has been widely applied in safety-critical areas

such as aerospace industry. It is suitable for analyzing a complicated system from predefined

subsystems. This paper makes use of FTA for the safety of nuclear power plant, especially

focusing on some critical subsystems.

IE&M DEPT,BIT Page 1

FAULT TREE ANALYSIS

CONTENTS

1. INTRODUCTION

2. HISTORY

3. SYMBOLS USED IN FTA

4. SAFETY AND RELIABILITY ANALYSIS

5. METHODOLOGY

6. BASIC MATHEMATICAL FOUNDATION

7. ANALYSIS

8. ILLUSTRATION OF THE STEPS OF A FTA

9. COMPARISON WITH OTHER ANALYTICAL METHODS

10. FTA OPERATES IN FAILURE SPACE

11. FTA PROJECT MANAGEMENT TASK

12. FTA ADVANTAGES

13. CASE STUDY 1

14. CASE STUDY 2

15. REFERRENCE


FAULT TREE ANALYSIS

LIST OF FIGURES

1. Figure 1: Construction of Fault Tree Diagram

2. Figure 2: Symbols

3. Figure 3: Illustration of the Steps of a FTA

4. Figure 4: The Ishikawa Fishbone Diagram

5. Figure 5: Success Space versus Failure Space

6. Figure 6: Different Failure and Success States for a Trip

7. Figure 7: Example of a Dynamic System

8. Figure 8:Outline of the FT for the Dynamic Example

9. Figure 9: A Typical 5-layer Nuclear Power Plant

10. Figure 10: Heat Discharge

11. Figure 11: FTA for Heat Discharge

12. Figure 12: FTA for Inflammable gas control

13. Figure 13: Radio- Based Railroad Crossing

14. Figure 14: Model of the crossing-chart

15. Figure 15: Model of the train-control-chart

16. Figure 16: Model of the communication-chart

17. Figure 17: Fault tree for hazard collision

LIST OF TABLES

1. Table 1: Safety Functions Of Nuclear Power Plant

2. Table 2: Primary Building Blocks of FTA

3. Table 3: Formalization of fault tree nodes


FAULT TREE ANALYSIS

INTRODUCTION

Fault tree analysis (FTA) is a top down, deductive failure analysis in which an undesired state of

a system is analyzed using Boolean logic to combine a series of lower-level events. This analysis

method is mainly used in the fields of safety engineering and reliability engineering to

understand how systems can fail, to identify the best ways to reduce risk or to determine (or get a

feeling for) event rates of a safety accident or a particular system level (functional) failure. FTA

is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical

and other high-hazard industries; but is also used in fields as diverse as risk factor identification

relating to social service system failure. FTA is also used in software engineering for debugging

purposes and is closely related to cause-elimination technique used to detect bugs.FTA is the

most commonly used technique for causal analysis in risk and reliability studies. This analysis

method is mainly used in the field of safety engineering to quantitatively determine the

probability of a safety hazard. FTA is a graphical design technique that provides an alternative to

reliability block diagrams. It is broader in scope than a reliability block diagram and differs from

reliability block diagrams in several respects. It is top-down, deductive analysis structured in

terms of events rather than components. An advantage of focusing on failures is that failures are

usually easier to define than non-failures and there may be far fewer ways in which non-failures

can occur. The focus is usually on a significant failure or a catastrophic event, which is referred

to as the top event and appears at the top of the fault tree diagram. The qualitative analysis

consists of identifying the various combinations of events that will cause the top event to occur.

This may followed by a quantitative analysis to estimate the probability of occurrence of the top

event.

Figure 1: Construction of Fault Tree Diagram


FAULT TREE ANALYSIS

Fault Tree Analysis usually involves events from machine wear out, material failure or

combinations of deterministic contributions to the event stemming from assigning hardware

(machine)/system failure rate to branches or cut sets. Typically failure rates are carefully derived

from substantiated historical data such as mean time between failure of the components, unit,

subsystem or function. Predictor data may be assigned.

FTA can be used as a valuable design tool, can identify potential accidents, and can eliminate

costly design changes. It can also be used as a diagnostic tool, predicting the most likely system

failure in a system breakdown. FTA is used in safety engineering and in all major fields of

engineering.

FTA can be used to:

1. Understand the logic leading to the top event / undesired state.

2. Show compliance with the (input) system safety / reliability requirements.

3. Prioritize the contributors leading to the top event - Creating the Critical

Equipment/Parts/Events lists for different importance measures.

4. Monitor and control the safety performance of the complex system (e.g., is a particular

aircraft safe to fly when fuel valve x malfunctions? For how long is it allowed to fly with

the valve malfunction?).

5. Minimize and optimize resources.

6. Assist in designing a system. The FTA can be used as a design tool that helps to create

(output / lower level) requirements.

7. Function as a diagnostic tool to identify and correct causes of the top event. It can help

with the creation of diagnostic manuals / processes.

WHY FTA?

For any so-called “system”, it is impossible to be perfect, and there eventually will be a failure

somewhere. Especially, for safety-critical system, the failure could result insignificant property

or environment damage, and even loss of life. Therefore, it is extraordinarily necessary to deal

with the system failure to keep system reliability. But meanwhile, the probability for a complete

or partial success is greater than the probability of a complete failure or partial failure, so


FAULT TREE ANALYSIS

assembling a success tree can turn out to be very time consuming. And for a complete system,

the complexity makes assembling a FTA a costly and cumbersome experience, so it is reasonable

to divide subsystems and start analysis from them. In this way, dealing with systems in smaller

scale can reduce error work probability and system analysis. Thereafter, the whole well analyzed

system can be integrated by these subsystems.

HISTORY

Fault Tree Analysis (FTA) was originally developed in 1962 at Bell Laboratories by H.A.

Watson, under a U.S. Air Force Ballistics Systems Division contract to evaluate the Minuteman I

Intercontinental Ballistic Missile (ICBM) Launch Control System. The use of fault trees has

since gained widespread support and is often used as a failure analysis tool by reliability experts.

Following the first published use of FTA in the 1962 Minuteman I Launch Control Safety Study,

Boeing and AVCO expanded use of FTA to the entire Minuteman II system in 1963-1964. FTA

received extensive coverage at a 1965 System Safety Symposium in Seattle sponsored by Boeing

and the University of Washington. Boeing began using FTA for civil aircraft design around

1966.

Subsequently within the U.S. military, application of FTA for use with fuzes was explored by

Picatinny Arsenal in the 1960s and 1970s. In 1976 the U.S. Army Material Command

incorporated FTA into an Engineering Design Handbook on Design for Reliability. The

Reliability Analysis Center at Rome Laboratory and its successor organizations now with the

Defense Technical Information Center (Reliability Information Analysis Center and now

Defense Systems Information Analysis Center) has published documents on FTA and reliability

block diagrams since the 1960s. MIL-HDBK-338B provides a more recent reference.

In 1970, the U.S. Federal Aviation Administration (FAA) published a change to 14 CFR 25.1309

airworthiness regulations for transport category aircraft in the Federal Register at 35 FR 5665

(1970-04-08). This change adopted failure probability criteria for aircraft systems and equipment

and led to widespread use of FTA in civil aviation. In 1998, the FAA published Order 8040.4,

establishing risk management policy including hazard analysis in a range of critical activities

beyond aircraft certification, including air traffic control and modernization of the U.S. National


FAULT TREE ANALYSIS

Airspace System. This led to the publication of the FAA System Safety Handbook, which

describes the use of FTA in various types of formal hazard analysis.

Within the nuclear power industry, the U.S. Nuclear Regulatory Commission began using

probabilistic risk assessment (PRA) methods including FTA in 1975, and significantly expanded

PRA research following the 1979 incident at Three Mile Island. This eventually led to the 1981

publication of the NRC Fault Tree Handbook NUREG–0492, and mandatory use of PRA under

the NRC's regulatory authority.

Following process industry disasters such as the 1984 Bhopal disaster and 1988 Piper Alpha

explosion, in 1992 the United States Department of Labor Occupational Safety and Health

Administration (OSHA) published in the Federal Register at 57 FR 6356 (1992-02-24) its

Process Safety Management (PSM) standard in 19 CFR 1910.119. OSHA PSM recognizes FTA

as an acceptable method for process hazard analysis (PHA).

Today FTA is widely used in system safety and reliability engineering, and in all major fields of

engineering.

SYMBOLS USED IN FTAThe basic symbols used in FTA are grouped as events, gates, and transfer symbols.

Figure 2: Symbols


FAULT TREE ANALYSIS

SAFETY AND RELIABILITY ANALYSIS

The major goal of safety and reliability analysis is to reduce the probability of failure and the

resulting losses:

1. Human Losses

– Death

– Sickness or disability

– Injury

2. Economic Losses

– Production or service shutdown

– Off- specification products or services

– Loss of capital equipment

3. Environmental Losses

– Air pollution

– Water pollution

– Other degradations of the environment

ROLE OF FTA IN SYSTEM SAFETY ANALYSIS

1. FTA is used to resolve the causes of system failure.

2. FTA is used to quantify system failure probability.

3. FTA is used to evaluate potential upgrades to a system.

4. FTA is used to optimize resources in assuring system safety.

5. FTA is used to resolve causes of an incident.

6. FTA is used to model system failures in risk assessments.

FTA INTERFACE WITH RELIABILITY ANALYSIS

1. For quantification, the basic component inputs to FTA are component failure rates and

repair rates.

2. For a first order calculation, the failure rates and repair rates are treated as being constant.


FAULT TREE ANALYSIS

3. For more detailed quantifications, the failure rates and repair rates can be modeled as

being age or time dependent.

4. Weibull distributions are often used for the failure times.

5. Lognormals or threshold exponential can be used for the repair times.

6. FTA can be linked to failure and repair data records.

ROLE OF FTA IN PRA (Probabilistic Risk Assessment)

1. A Probabilistic Risk Assessment (PRA) models event scenarios.

2. An event scenario consists of an initiating event and subsequent system failures.

3. FTA is carried out to model the causes of the system failures.

4. Using data on the probability of the causes, the probability of system failure is

determined.

5. The probability of the accident scenario is thereby determined.

METHODOLOGY

FTA methodology is described in several industry and government standards, including NRC

NUREG–0492 for the nuclear power industry, an aerospace-oriented revision to NUREG–0492

for use by NASA, SAE ARP4761 for civil aerospace, MIL–HDBK–338 for military systems,

IEC standard IEC 61025 is intended for cross-industry use and has been adopted as European

Norm EN 61025.

Since no system is perfect, dealing with a subsystem fault is a necessity, and any working system

eventually will have a fault in some place. However, the probability for a complete or partial

success is greater than the probability of a complete failure or partial failure. Assembling a FTA

is thus not as tedious as assembling a success tree which can turn out to be very time consuming.

Because assembling a FTA can be a costly and cumbersome experience, the perfect method is to

consider subsystems. In this way dealing with smaller systems can assure less error work

probability, less system analysis. Afterward, the subsystems integrate to form the well analyzed

big system.


FAULT TREE ANALYSIS

An undesired effect is taken as the root ('top event') of a tree of logic. The logic to get to the right

top events can be diverse. One type of analysis that can help with this is called the functional

hazard analysis, based on Aerospace Recommended Practice. There should be only one Top

Event and all concerns must tree down from it. Then, each situation that could cause that effect is

added to the tree as a series of logic expressions. When fault trees are labeled with actual

numbers about failure probabilities, computer programs can calculate failure probabilities from

fault trees. When a specific event is found to have more than one effect event, i.e. it has impact

on several subsystems, it is called a common cause or common mode. Graphically speaking, it

means this event will appear at several locations in the tree. Common causes introduce

dependency relations between events. The probability computations of a tree which contains

some common causes are much more complicated than regular trees where all events are

considered as independent. Not all software tools available on the market provide such

capability.

The Tree is usually written out using conventional logic gate symbols. The route through a tree

between an event and an initiator in the tree is called a Cut Set. The shortest credible way

through the tree from fault to initiating event is called a Minimal Cut Set.

Some industries use both fault trees and event trees (see Probabilistic Risk Assessment). An

Event Tree starts from an undesired initiator (loss of critical supply, component failure etc.) and

follows possible further system events through to a series of final consequences. As each new

event is considered, a new node on the tree is added with a split of probabilities of taking either

branch. The probabilities of a range of 'top events' arising from the initial event can then be seen.

Classic programs include the Electric Power Research Institute's (EPRI) CAFTA software,

which is used by many of the US nuclear power plants and by a majority of US and international

aerospace manufacturers, and the Idaho National Laboratory's SAPHIRE, which is used by the

U.S. Government to evaluate the safety and reliability of nuclear reactors, the Space Shuttle, and

the International Space Station. Outside the US, the software Risk Spectrum is a popular tool for

Fault Tree and Event Tree analysis and is licensed for use at almost half of the world's nuclear

power plants for Probabilistic Safety Assessment.


FAULT TREE ANALYSIS

BASIC MATHEMATICAL FOUNDATION

Events in a fault tree are associated with statistical probabilities. For example, component

failures typically occur at some constant failure rate λ (a constant hazard function). In this

simplest case, failure probability depends on the rate λ and the exposure time t:

P = 1 - exp(-λt)

P ≈ λt, λt < 0.1

A fault tree is often normalized to a given time interval, such as a flight hour or an average

mission time. Event probabilities depend on the relationship of the event hazard function to this

interval.

Unlike conventional logic gate diagrams in which inputs and outputs hold the binary values of

TRUE (1) or FALSE (0), the gates in fault tree output probabilities related to the set operations

of Boolean logic. The probability of a gate's output event depends on the input event

probabilities.

An AND gate represents a combination of independent events. That is, the probability of any

input event to an AND gate is unaffected by any other input event to the same gate. In set

theoretic terms, this is equivalent to the intersection of the input event sets, and the probability of

the and gate output is given by:

P (A and B) = P (A ∩ B) = P(A) P(B)

An OR gate, on the other hand, corresponds to set union:

P (A or B) = P (A ∪ B) = P(A) + P(B) - P (A ∩ B)

Since failure probabilities on fault trees tend to be small (less than .01), P (A ∩ B) usually

becomes a very small error term, and the output of an OR gate may be conservatively

approximated by using an assumption that the inputs are mutually exclusive events:

P (A or B) ≈ P(A) + P(B), P (A ∩ B) ≈ 0


FAULT TREE ANALYSIS

An exclusive OR gate with two inputs represents the probability that one or the other input, but

not both, occurs:

P (A x or B) = P(A) + P(B) - 2P (A ∩ B)

Again, since P (A ∩ B) usually becomes a very small error term, the exclusive OR gate has

limited value in a fault tree.

ANALYSIS

Many different approaches can be used to model a FTA, but the most common and popular way

can be summarized in a few steps. A single fault tree is used to analyze one and only one

undesired event or top event, which may be subsequently fed into another fault tree as a basic

event. Though the nature of the undesired event may vary dramatically, a FTA follows the same

procedure for any undesired event; be it a delay of 0.25 ms for the generation of electrical power,

an undetected cargo bay fire, or the random, unintended launch of an ICBM. Due to labor cost,

FTA is normally only performed for more serious undesired events.

FTA analysis involves five steps:

1. Define the undesired event to study

Definition of the undesired event can be very hard to catch, although some of the events are very

easy and obvious to observe. An engineer with a wide knowledge of the design of the system or

a system analyst with an engineering background is the best person who can help define and

number the undesired events. Undesired events are used then to make the FTA, one event for one

FTA; no two events will be used to make one FTA.

2. Obtain an understanding of the system

Once the undesired event is selected, all causes with probabilities of affecting the undesired

event of 0 or more are studied and analyzed. Getting exact numbers for the probabilities leading

to the event is usually impossible for the reason that it may be very costly and time consuming to

do so. Computer software is used to study probabilities; this may lead to less costly system

analysis.


FAULT TREE ANALYSIS

System analysts can help with understanding the overall system. System designers have full

knowledge of the system and this knowledge is very important for not missing any cause

affecting the undesired event. For the selected event all causes are then numbered and sequenced

in the order of occurrence and then are used for the next step which is drawing or constructing

the fault tree.

3. Construct the fault tree

After selecting the undesired event and having analyzed the system so that we know all the

causing effects (and if possible their probabilities) we can now construct the fault tree. Fault tree

is based on AND and OR gates which define the major characteristics of the fault tree.

4. Evaluate the fault tree

After the fault tree has been assembled for a specific undesired event, it is evaluated and

analyzed for any possible improvement or in other words study the risk management and find

ways for system improvement. This step is as an introduction for the final step which will be to

control the hazards identified. In short, in this step we identify all possible hazards affecting in a

direct or indirect way the system.

5. Control the hazards identified

This step is very specific and differs largely from one system to another, but the main point will

always be that after identifying the hazards all possible methods are pursued to decrease the

probability of occurrence.


FAULT TREE ANALYSIS

ILLUSTRATION OF THE STEPS OF A FTA

Figure 3: Illustration of the Steps of a FTA

COMPARISON WITH OTHER ANALYTICAL METHODS

FTA is a deductive, top-down method aimed at analyzing the effects of initiating faults and

events on a complex system. This contrasts with failure mode and effects analysis (FMEA),

which is an inductive, bottom-up analysis method aimed at analyzing the effects of single

component or function failures on equipment or subsystems. FTA is very good at showing how

resistant a system is to single or multiple initiating faults. It is not good at finding all possible

initiating faults. FMEA is good at exhaustively cataloging initiating faults, and identifying their

local effects. It is not good at examining multiple failures or their effects at a system level. FTA

considers external events, FMEA does not. In civil aerospace the usual practice is to perform

both FTA and FMEA, with a failure mode effects summary (FMES) as the interface between

FMEA and FTA.

Alternatives to FTA include dependence diagram (DD), also known as reliability block diagram

(RBD) and Markov analysis. A dependence diagram is equivalent to a success tree analysis

(STA), the logical inverse of an FTA, and depicts the system using paths instead of gates. DD

and STA produce probability of success (i.e., avoiding a top event) rather than probability of a


FAULT TREE ANALYSIS

top event.FTA is not a Fishbone analysis which is a more informal depiction of event causes

(informal deductive)

FTA is not an FMEA which assesses different effects of single basic causes (inductive). It is not

Event Tree Analysis which assesses the consequences of given initiating events (inductive). FTA

is a formal approach for resolving the basic causes of a given undesired event (formal deductive).

THE FTA VERSUS THE ISHIKAWA FISHBONE

1. A fault tree is sometimes erroneously thought to be an example of an Ishikawa Fishbone

Model.

2. The fishbone is a loosely-structured, brain-storming tool for listing potential causes of an

undesired event.

3. Fault tree analysis is a stepwise formal process for resolving an undesired event into its

immediate causes.

4. The fault tree displays the stepwise cause resolution using formal logic symbols.

Figure 4: The Ishikawa Fishbone Diagram


FAULT TREE ANALYSIS

FTA OPERATES IN FAILURE SPACE

1. Designers design for success

2. Safety analysts analyze for failure

3. There can be various degrees of success

4. Thresholds for failure are identifiable

5. Failure events can be more readily discredited

6. Failure quantifications are simpler

7. The “failure mindset” probes for weaknesses and gaps

Figure 5: Success Space versus Failure Space


FAULT TREE ANALYSIS

EXAMPLE:

Figure 6: Different Failure and Success States for a Trip

DYNAMIC FAULT TREE ANALYSIS (DFTA)

DFTA is a term used to refer to analysis of a system which dynamically responds to a failure or a

stimulus.

– A cold standby component activated by another failure

– A system configuration change due to a failure

– A system configuration change responding to a signal

– Failures that occur in a particular sequence


FAULT TREE ANALYSIS

– Failure criteria that change for a new mission phase

Figure 7: Example of a Dynamic System

Figure 8:Outline of the FT for the Dynamic Example

FTA PROJECT MANAGEMENT TASK

1. Define the FTA

– Top Event

– Scope

– Resolution

2. Assemble the project Team

– FT analyst


FAULT TREE ANALYSIS

– System engineering support

– Data support

– Software support

3. Define the FTA Operational Framework

– Assemble the as built drawings

– FT naming scheme

– Interfaces/Support to be modeled

– Software to be used

4. Assemble the data

– Generically applicable data

– Specifically applicable data

5. Prepare the software package

– Familiarization

– Test problems

6. Keep a log on the FTA work

– Operational and design assumptions

– Events not modeled and why

– Success and failure definitions

– Special models and quantifications used

7. Review the work at stages

– FT construction

– Qualitative evaluations

– Quantitative evaluations

8. Check and validate the results

– Engineering logic checks


FAULT TREE ANALYSIS

– Consistency checks with experience

9. Prepare and disseminate the draft report

– Conclusions/findings

– FTA results

– FTs

– Software inputs/outputs

10. Obtain feedback and modify and final report

– Disseminate the report

– Present findings

FTA ADVANTAGES1. It makes the analyst concentrate on a given Top Event

2. It directs the analysis to focus on failures

3. It provides a graphical aid in giving visibility to those in systems management who are

removed from system design changes

4. It provides options for quantitative and qualitative systems reliability analysis

5. It provides an insight into system behavior

CASE STUDY 1

1. Nuclear Power Plant


FAULT TREE ANALYSIS

Figure 9: A Typical 5-layer Nuclear Power Plant

Figure 1 demonstrates a typical nuclear power plant with defense-in-depth. Currently the

defenses vary depending on the type of plant, the owner of the plant, the purpose of usage and

the generation the plant is from, etc. From the layered model above, combined with functions of

each parts of a nuclear power plant, there are some safety functions extracted (Table Ⅰ), which

can be treated as the functions of subsystems of the nuclear power plant.

To compose a fault tree visually, a number of symbols (Table II) have been utilized for building

blocks of a faulttree.Based on the safety functions in Table I, to demonstratethe utility of FTA in

nuclear power plant, some subsystemsare taken for case study. And fault trees for

thesesubsystems will be built of the specific symbols in Table II.

A. Heat Discharge

Heat discharge is useful to keep the temperature at anacceptable level to prevent faults led by

overheating. Thearchitecture of heat discharge subsystem can be designed asFigure II. The

assumed inexhaustible produced heat will bepumped through a valve to the cooling system of the

nuclear power plant.For this subsystem, the fault tree can be retrieved as Figure 3.

B. Inflammable Gas Control

Inflammable gas control, which controls the quantity of hydrogen in pressure tank, is another

important subsystem of nuclear power plant. In the high temperature environment of pressure


FAULT TREE ANALYSIS

tank, the quantity of hydrogen must be kept in a safe level to prevent burning or even exploding.

The fault tree for inflammable gas control is built as Figure 4. The events stand for the following:

E1: Reactor core decay produces superfluous hydrogen

E2: Reaction between melt reactor core and concrete produces superfluous hydrogen

E3: Hydrogen is not discharged out of pressure tank in time

A1: Zr is in a large quantity

B1: Reactor core is heated at a high speed

C1: Operator takes improper actions for heating

A2: Water exists in concrete

B2: Fragments of reactor are not cooled down in time

C2: A great deal of metal is contained in concrete

A3: Sensors for detecting hydrogen volume in pressure tank fail

B3: Pumps for discharging hydrogen fail

C3: Valves for discharging hydrogen fail


FAULT TREE ANALYSIS

Table 1: Safety Functions Of Nuclear Power Plant

Table 2: Primary Building Blocks of FTA


FAULT TREE ANALYSIS

Figure 10: Heat Discharge

Figure 11: FTA for Heat Discharge


FAULT TREE ANALYSIS

Figure 12: FTA for Inflammable gas control

Conclusion:

FTA can provide a methodology for the development and quantification of the potential faults

for a complicated and safety-critical system like nuclear power plant. To achieve the complete

fault tree of such a complicated system, it is a feasible method to integrate fault trees of each

subsystems, which are pre-divided before the analysis.


FAULT TREE ANALYSIS

CASE STUDY 2

RADIO-BASED RAILROAD CROSSING

The application of formal FTA, we present an analysis of a radio-based railroad crossing. The

case study was done using the interactive theoremprover KIV and the proof effort was about 1.5

person months. This case studyis the reference case study of the German research councils

(DFG) priority program1064. This program aims at bringing together field-tested engineering

techniqueswith modern methods of the domain of software engineering.

The German railway organization, Deutsche Bahn, prepares a novel techniqueto control railroad

crossings: decentralized, radio-based railroad crossing control.This technique aims at medium

speed routes, i.e. routes with maximum speed of160 km/h.

Figure 13: Radio- Based Railroad Crossing

The main difference between this technology and the traditional control of rail-road crossings is

that signals and sensors on the route are replaced by radio communication and software

computations in the train and railroad crossing. This offers cheaper and more flexible solutions,

but also shifts safety criticalfunctionality from hardware to software.

Instead of detecting an approaching train by a sensor, sending this informationto a central office

which closes the railroad crossing, the train continuously computesthe position where it has to


FAULT TREE ANALYSIS

send a signal to secure the level crossing. This effectivelysaves money (not so much equipment

on the track is needed) and removes the centralcontrol office (this is a single point of failure for

all trains in the region). To calculatethe activation point the train uses data about its position,

maximum decelerationand the position of the crossing. Therefore the train has to know the

position of therailroad crossing, the time needed to secure the railroad crossing, and its

currentspeed and position. The first two items are memorized in a data store and thelast two

items are measured by an odometer. For safety reasons a safety margin is added to the activation

distance. This allows compensating some deviations in theodometer. The system works as

follows:

The train continuously computes its position. When it approaches a crossing, itbroadcasts a

‘secure’-request to the crossing. When the railroad crossing receivesthe command ‘secure’, it

switches on the traffic lights, first the ‘yellow’ light, thenthe ‘red’ light, and finally closes the

barriers. When they are closed, the railroadcrossing is ‘secured’ for a certain period of time. The

‘stop’ signal on the train route,indicating an insecure crossing, is removed and substituted by

computation andcommunication. Shortly before the train reaches the ‘latest braking point’

(latestpoint, where it is possible for the train to stop in front of the crossing), it requeststhe status

of the railroad crossing. When the crossing is secured, it responds with a‘release’ signal which

indicates, that the train may pass the crossing. Otherwise thetrain has to brake and stop before the

crossing. The railroad crossing periodicallyperforms self-diagnosis and automatically informs the

central office about defectsand problems. The central office is also responsible for repair and

provides routedescriptions for trains. These descriptions indicate the positions of railroad

crossingsand maximum speed on the route. The safety goal of the system is clear: it mustnever

happen that the train passes a crossing which is not secured.

A well designed control system must assure this property at least as long as nocomponent

failures occur. The corresponding hazard is “a train passes the crossingand the crossing is not

secured”. This is the only hazard which we will consider inthis case study.

A. The formal model:In the following part a brief description of the state chart model of this system is given. Note,

that the model not only includes intended behavior but failure modes as well. This is necessary

for all types of formal safety analysis. Details on how such models may be derived from

functional models of the intended behavior may be found.


FAULT TREE ANALYSIS

The model of the radio-based railroad crossing is split in three parallel charts.

One chart models the crossing another one models the communication and a third models the

train. These three charts are explained below.

B. Model of the crossing: The state chartshows the model of the crossing which is reacting to the signals sent by the train.

Initially the crossing is in state Opened, which means the bars are open. When the crossing

receives the signal Close Request Rcv from the train, it goes into state Closing. This activates a

timer called Closing Count which simulates the time needed for turning on the light signals at the

crossing and the closing of the bars. This takes the time (T Max Closing). After the expiration of

this time the crossing is closed (state Closed). Another timer Closed Count is started to assure

that the bars are not closed too long. This is a standard procedure in railroad organization. The

crossing reopens if either the train passes the danger spot (Pos > DS) or the timer reached T Max

Closed. The crossing also opens its bars if a fault in the sensor, which detects the passing of the

train, occurs

Figure 14: Model of the crossing-chart


FAULT TREE ANALYSIS

(Unwanted Open). The response of the crossing on the train’s status request is modeled by a

static reaction (SR CROSSING). If it receives a status request (Status Request Rcv), a release

message (Release Snd) will be sent if the bars are closed (intended behavior) or if there is a

faulty detection at the sensor for the bars’ position (Error Closed).

C. Model of the Train:The model of the train is divided into two parts: one for modeling the physics of the train and

one for modeling the controller logic. From a theoretical point of view, it is advisable to model

the control and the physics of the train separately.

But in this example, the physical model consists only of some static reactions. These static

reactions basically state, that the position of the train updates according to the speed and that the

speed updates according to the acceleration. So for an easier representation these two parts have

been combined.

The train control supervises the position of the train, issues closing requests to the crossing and

ultimately decides, if an emergency stop is necessary or not. The train control is implemented in

software on-board the train. The formal model is given in figure. Starting from its initial state

Idle the chart goes into state Wfc (’wait for close’), when the train approaches the crossing and

the control sends a close request (Close Request Snd) to the crossing. The point when this signal

is sent is continuously calculated depending on the actual speed, estimated closing and

communication time, and the maximum deceleration of the train. This is modeled in the

predicate Close (Pos, V,AccMAX,DS). Sometime later, the train reaches another virtual control

point which is also calculated continuously and modeled in predicate Request(Pos,

V,AccMAX,DS). This is the position when the train sends a status request (Status Request Snd)

to the crossing. The control is then in state Wfs (’wait for status answer’). If the train receives a

release signal within the next Wfs Count time units the controller will go into state Go and the

train may pass the crossing. Otherwise an emergency stop must be issued. In this case the brakes

are activated (A = AccMAX) and the controller goes into state Brake. A failure of the brakes is

also modeled. If the brakes fail, the controller will still go into state Brake, but there will be no

real deceleration. The two states Brake and Go are final states of the chart, so they won’t be left

anymore.


FAULT TREE ANALYSIS

Figure 15: Model of the train-control-chart

D.Model of the communication:The communication is modeled by three static reactions, see figure 7. These static reactions

represent the function and disfunction of the communication. The functional communication

relays all incoming messages, e.g. (SR COMM1) the closerequest of the train (Close Request

Snd) is forwarded to the crossing as Close Request Rcv.If the communication fails (Failure

Comm) then no messages will reach their receiver. The other two static reactions represent the

status request (SR COMM2)and the release message (SR COMM3).

Figure 16: Model of the communication-chart

E. Fault Tree Analysis:


FAULT TREE ANALYSIS

This model is now analyzed with formal FTA (see Sect. 2). The interesting hazard is a situation,

where a train passes the crossing, while the bars are not closed. We will call this hazard

”collision”. The fault tree for this hazard is shown in figure.

Figure 17: Fault tree for hazard collision


FAULT TREE ANALYSIS

The top event of the fault tree (collision) may have two different causes. One is that the train

passes the crossing, while the bars are not closed, although no release signal has been sent. The

other is a situation where the train passes the crossing, while the bars are not closed, but a release

signal has been sent. The first cause corresponds to misbehavior of the train and the second to

one of the crossing. The”or” relationship is modeled by a decomposition gate. These two

different situations must be further analyzed.

The left node — train passes the crossing (while the crossing is not closed) although no release

signal has been received, is caused by a failure in the train’s behavior, so no information about

the crossing is needed. This is phrased by a D-INHIBIT-gate. The right node, train passing the

not closed crossing and a release signal has been sent, can be caused by two different situations.

One is given by the train approaching the not closed crossing and the release signal is being sent

(while the crossing is not closed). The reason for this can be a fault in the position sensors of the

bars. The other possible reason is, that the bars open after a release signal has been sent but

before the train has passed the crossing. The reason for this can be either a timeout or a faulty

request to open the bars. The other cause is given by the train passing the opening/opened

crossing and the signal has been sent some time before.

As an example, the formalization of the first three nodes is shown in table.


FAULT TREE ANALYSIS

Table 3: Formalization of fault tree nodes

The resulting proof obligation is then constructed by inserting these formal descriptions of the

nodes into the D-OR-gate formula of Fig.. The other fault tree gates are handled analogously.

The fault tree above has been proven complete.

This means that for every gate the corresponding proof obligation has been shown.

The conclusion is, that – for this example – all minimal cut sets are single-point-of- failures. So

there is no redundancy in the system. On the other hand the fault tree also shows, that if these

failures are prevented then the hazard will not occur. In other words if nothing fails, the system

will work as intended or even shorter: the system is functionally correct.

F. Lessons learned As already mentioned in the abstract the here presented case study is to our knowledge the first

formal safety analysis of an infinite state system. In this section we will briefly present our

experiences with proving FTA formulas over an infinite state model. To prove the correctness of

the fault tree, we used KIV as an interactive verification tool.

One big advantage of the KIV system is, that it natively supports state charts as specification

mechanism. The state chart model shown in section 3 can be directly used as a system

specification in KIV. The proof obligations are derived from the fault tree as shown in the

previous section. They can also be generated by the fault tree module of KIV. KIV allows

proving temporal properties with symbolic execution and induction. This means every temporal

formula is split into a predicate logic part and some property which must hold from the next step

onwards.

In practice this results in stepping through all reachable states of the state chart until a loop is

found and induction can be applied. State explosion can be avoided by generalization.

Generalization means that instead of proofing a formula a more general theorem is proven. The

starting formula is then a specialization of the more general theorem. This type of strategy often

helps when verifying interactively.

Altogether the case study required an effort of about one and a half person months.

We made the following experiences during this case study:


FAULT TREE ANALYSIS

FFTA proofs are easy, but time consuming. Almost all proof steps can be done automatically.

Only finding adequate generalizations and identifying the correct inductive argument (i.e. the

corresponding state) requires human interaction and skill. In most cases, generalizations can only

be found manually. In particular for big proofs it can be very time consuming to find this

position (i.e. the part of the proof where a similar sub goal had already been proven) in the proof

tree. For locating the correct spot it seems to be possible to use hash functions. This will make

state chart proofs much easier and faster. Generalization is a great help, but are not easy to be

found (see above).

It is clear that the more generalizations are made in the more possible successor states will be

possible in and vice versa. For example if you analyze a deterministic state chart, then with no

generalization each step in time will result in exactly one new state. If you generalize this state

chart (i.e. you throw away all information on the current) state, then you will get all possible

states as possible candidates for the next step in time. In many cases even this ”brutal“

generalization can be helpful (i.e. if you have to prove that the train moves in one direction).

Although you can get as many as 200 case distinctions in your proof, the KIV system can close

all of them with its built-in predicate logic simplifier.

This leads to two approaches to prove FTA properties: depth-first-search and breath-first-search.

Depth-first-search is more useful as an strategy, if it is unclear if a proof obligation holds or not

(i.e. if the nodes of the fault tree have been formalized correctly or not). This is useful in

particular to validate a formula and find faults early. Breadth-first search is in general faster, but

will only discover specification errors at the very end. But for some properties it is even possible

to fully generalize the state of the system and close the proof in one step.

Formalizing FTA nodes is difficult. Even for simple systems it can be very hard to correctly

formalize the nodes of the fault tree. This is because the informal understanding of a fault tree

(decomposition of causes into components) is not enough for a formal description. This problem

can be attenuated if all proof obligations are at the beginning validated with depth-first-search. It

is our experience that this additional effort is really worth the time, because formalizing nodes of

a fault tree is very error-prone.


FAULT TREE ANALYSIS

Conclusion:

We showed the first verification of an infinite state system with FTA. Our Experiences show,

that formal FTA with interactive verification is a promising, but not an easy topic. Many

problems arise from specification errors. These problems may be countered with good

methodology. Compared to other formal safety analysis methods, formal FTA is the only one

which has a human readable and understandable logic background structure and will thus be

more easily accepted in industry than push-the-button techniques (like pure model checking).

REFRENCE1. HAL Id: inria-00089487 https://hal.inria.fr/inria-00089487

2. C. A. Ericson II, “Fault Tree Analysis - A History,” (1999) Proc. the 17th International

System Safety Conference.

3. C. V. Ramamoorthy, Y. W. Han & G. S. Ho, “Fault Tree Analysis of Computer

Systems,” (1977) Proc. the National Computer Conference.

4. M. Stamatelatos et al., “Fault Tree Handbook with Aerospace Applications,” (2002)

NASA, August.

5. J. C. Knight, “Safety Critical Systems: Challenges and Directions,” (2002) Proc. the 24th

International Conference on Software Engineering (ICSE 2002).

6. J. S. Feinstein, “The Safety Regulation of U.S. Nuclear Power Plants: Violations,

Inspections, and Abnormal Occurrences,” (1989) the Journal of Political Economy, vol.

97, no. 1, pp. 115-154.

7. ONCOR, “Nuclear Energy,”

http://www.oncor.com/community/knowledgecollege/energy_library/elec_nuc.aspx,

accessed on August 9, 2010

8. J. M. Hendrie, “Nuclear Power Plants: Structure and Function,” (1983) Symposium on the

Health Aspects of Nuclear Power Plant Incidents.


https://hal.inria.fr/inria-00089487

FAULT TREE ANALYSIS

9. W. E. Vesely et al., “Fault Tree Handbook”, U. S. Nuclear Regulatory Commission

(NUREG-0492), January 1981

10. “Fault Tree Handbook with Aerospace Applications’, Version 1.1, NASA Publication,

August 2002.