View
219
Download
4
Category
Preview:
Citation preview
OSMA2003 Center for Reliability Engineering 1
Integrating Software into PRA
Presented by C. Smidts
Center for Reliability EngineeringUniversity of Maryland
Center for Reliability Engineering 2
Integrating Software into PRA
Probabilistic Risk Assessment (PRA) is a technique to assess the probability of failure or success of a mission. Current PRA neglects the contributions of software to the risk of the mission. The objective of our research is to extend current PRA methodology to integrate software in the risk assessment process.
Center for Reliability Engineering 3
What We Have Done to Date
• Built a Software Failure Mode Taxonomy • Failure Modes’ Quantification: generic high-level
data- Public Literature- Expert Opinion
• Collaborated with JSC through Ms. Alice Lee- Validate Our Methodology- Collect Data
• Developed a Test-Based Methodology for Integrating Software Into PRA
Center for Reliability Engineering 4
What We Are Planning to Do in the Future
• Investigate Scalability Issues of the Test-based Approach
• Continue the validation of our methodology with JSC
• Apply the approach to JSC system• Revise the methodology based on NASA system• Develop an Analytical Approach• Apply the Analytical Approach to JSC system• Revise the Analytical Approach based on NASA
system
OSMA2003 Center for Reliability Engineering 5
Integrating Software into PRA:A Test-based Approach
Presented by C. Smidts
C. Smidts, B. Li, M. Li Center for Reliability Engineering
University of Maryland
Center for Reliability Engineering 6
Integrating Software into PRA - Approach
• We are working on an approach to integrate software into PRA.- Step 1: Identify events/components
controlled/supported by software in MLD, accident scenarios, fault trees.
- Step 2: Specify the functions involved
- Step 3: Model software functions in ESDs/ETs and Fault Trees
- Step 4: Construct the input tree
- Step 5: Quantify the input tree
- Step 6: Develop and perform software safety tests
Center for Reliability Engineering 7
Example System
An exit system in a building is used as the example in this case study.
The exit system includes an emergency exit system and the PACS system.
The Emergency exit system includes an emergency exit door and a marked egress router. It provides an escape route for personnel located inside the building during emergency situations.
The PACS system is a simplified version of an automated personal entry/exit access system used to provide privileged physical access to rooms /buildings, etc. Personal ID and PIN are needed to access this system.
Center for Reliability Engineering 8
Integrating software into PRA - Approach
Step 1: Identify events/components • Identify events/components
controlled/supported by software in MLD, accident scenarios, fault trees.
• For all such events, create/expand contributors to account for software.
• Verify that no neglected “events” may now have become possible due to software.
Center for Reliability Engineering 9
MLD
Loss of Occupants
AND GATE
Internal Accident
OR GATE
Fire
Gas
Chemical Explosion
Poisonous Material (anthrax)
Exit Fails
AND GATE
Emergency Exit Fails
Normal Exit Path Fails
Gate (HW) Fails
OR GATE
Center for Reliability Engineering 10
MLD
Loss of Occupants
AND GATEInternal Accident Disaster
OR GATE
Fire
Gas
Chemical ExplosionPoisonous Material (anthrax)
Exit Fails
AND GATE
Emergency Exit FailsNormal Exit Path Fails
OR GATE
Death due to Temperature
Software Fails
Death due to unable to evacuate
Gate Fails (HW)
OR GATE
Software FailsOR GATE
Center for Reliability Engineering 11
Accident Description
• Fire is the initiating event
• Response systems: Emergency system and PACS system
• End State: Loss of life
Center for Reliability Engineering 12
EmergencyExit
LED1Yes
Safe
Safe
Loss oflife
Safe
No
Yes
FireProtection
YesSafe
No
Loss oflife
Guardthere
Guardaction
T1< Tcritical
Yes
No
Loss oflife
Delay GateDelay of
opening of gate T2< Tcritical
Yes Yes Yes
No
NoNo
Gate Delay of openingof gate T2< T
criticalT3< T
critical
Yes
Fire
No
Sequence 1
Sequence 2
Sequence 3
Sequence 4
Sequence 5-6
Sequence 7
56
The userinserts the
card
Yes
Yes
Delay
The userinsert the PINDelay
Delay Safe
Loss oflife
Gate Delay of openingof gate T2< TcriticalT3< Tcritical
YesYesThe userinsert the PIN
Sequence 8-9
Sequence 10
Sequence 11-12
No
8
No
9
No
11
No
12
PACS
PACSYes
PACS
PACS
No
8
No
9
No
10
No
11 Sequence 8-11
Sequence 12
Sequence 13-15
No
13
No
14
No
15
Center for Reliability Engineering 13
Integrating software into PRA - Approach
Step 2: Specify the functions involved• Not all software functions are involved in accident
scenarios, i.e, not all software functions are involved in particular scenarios/fault trees or even in the entire realm of possible scenarios/fault trees.
• To identify the specific functions involved in a scenario, determine the specific input to/output from the software – this will describe one function.
• A list of possible functions can be found in the requirements.
• Match the input/output combinations of these functions to the risk model
Center for Reliability Engineering 14
Integrating software into PRA – Approach
PACS Functional Decomposition
Software Function Input Output SW1 Read data from
entrant’s card and validate card data
Card data (SSN and Last Name) “Enter PIN” or “See Officer” on LED
SW2 Read and validate input PIN values
PIN (4 digits) Entry of the 1st digit within time: the allowed time for the entry of the first digit of the PIN is 10 seconds. Entry of subsequent digits of PIN within time: the allowed time is 5 seconds
“See Officer” or “Please Proceed” on LED and Gate open, close and system resets.
SW3 Override: reset the system or open the gate and reset the system
Command to reset the system or open the gate
System reset or Gate open and system reset.
Center for Reliability Engineering 15
Integrating software into PRA - Approach
Actions and their inputs and outputs
Action Input Output Guard Override Push override button Gate open
Gate close User insert Card
Card LED: “Enter PIN” LED: ”See officer”
User insert PIN PIN LED: “Please Proceed” LED: “See officer” Gate open Gate close
Center for Reliability Engineering 16
Integrating software into PRA - Approach
Step 3: Modeling software function in ESDs/ETs and Fault trees
• In the ESDs/ETs, the function of interest is modeled as
Does SW produce anoutput that can lead to the
safe situation for thesequence
Outputdevice
Yes
No No
Input Delay ofExecution
Yes
Center for Reliability Engineering 17
Integrating software into PRA - Approach
Step 3: Modeling software function in ESDs/ETs and Fault trees • In the fault tree, the function of interest is modeled as
The software does notproduce the expected
output?
Failures caused byabnormal inputs
Failures caused byFunctional failures
AbnormalInput
NormalInput
Sof tware incorrectlyhandles the input f ailures
Sof tware f unctionalf ailures
Center for Reliability Engineering 18
Integrating software into PRA - ESD
EmergencyExit
Can SW1 produce theoutput that can lead to
the PIN input?LED1
Yes
No
Yes
Safe
Safe
Loss oflife
Safe
No
Yes
FireProtection
YesSafe
No
Loss oflife
Guardthere
Guardaction
T1< TcriticalYes
No
Loss oflife
Delay
Can the output of SW3 cause the gate to
open?Gate Delay of
opening of gate T2< Tcritical
Yes Yes Yes
No No No No
Delay ofexecution
Does SW2 produce theoutput that can lead toopening of the gate?
Gate Delay of openingof gate T2< TcriticalT3< Tcritical
Yes Yes
NoNo No
Fire
No
Sequence 1
Sequence 2
Sequence 3
Sequence 4
Sequence 5-6
Sequence 7
Sequence 8-11
5 6
8 9 10 11
The userinsert the
card
Input 1 Input 2
Input 3
P1 P2
P3
Yes
YesDelay ofExecution
Delay
The userinsert the PIN
Delay ofExecution
Safe
Loss oflife
Does SW2 produce theoutput that can lead to the
opening of the gate?Gate
Delay of openingof gate T2< T
criticalT3< T
critical
Yes Yes
NoNo No
Sequence 12
Sequence 13-1513 14 15
Input 2 P2 YesThe userinsert the PIN
Delay ofExecution
Delay
Delay
Center for Reliability Engineering 19
Integrating software into PRA - Approach
Step 4: Input Tree• Build the input tree for the particular function
involved• The input tree is a decomposition of the space
of possibilities• The input tree is mostly generic for a function.
But may VARY due to context.(i.e. probabilities of basic events may vary, certain events may conflict with the rest of the scenario conditions.)
Center for Reliability Engineering 20
Integrating software into PRA - Approach
Step 4: Input Tree
Input to Function
AbnormalNormal
FM1 FM2 FM8
FM1 = AmountFM2 = ValueFM3 = RangeFM4 = TypeFM5 = TimeFM6 = RateFM7 = DurationFM8 = Load
Center for Reliability Engineering 21
Input Fault Tree
Wrong Input 1
Amount Value Range Type
1 2 3 4
Input Fault Tree for SW1
Center for Reliability Engineering 22
Input Fault Tree
Amount
Insert CardInput
InformationRead Card
DamagedCard in terms
of Amount Readerfailure
R9failure
R6 fails to0
Card isappropriat
elyinserted
Card is notappropriat
elyinserted
R6 fails to1
Amount is 1although
should be 0
Amount is 0although
should be 1
R10fails to0
Cardinformation
isappropriately retrieved
Cardinformation
is notappropriately retrieved
R10 failsto1
1
OP3
Center for Reliability Engineering 23
Integrating software into PRA - Approach
Step 5: Quantify the input tree
Center for Reliability Engineering 24
Integrating software into PRA - Approach
Step 6: Develop and perform software safety tests• These tests’ unique objective is to answer the questions
contained in the model, i.e. in the MLD, accident scenarios and fault tree.
• The test is completely automated using Test Generation/test execution tools (TestMaster/WinRunner).
• The process is as follows:- Build a finite State Machine model of the software by
following the software functional decomposition derived from the risk model and the software requirements.
- Derive the test profile and output conditions to be quantified from the risk model
- Define and run the test cases according to the following test strategy
- Analysis consists in computing the probabilities of the different outcomes based on the test data.
Center for Reliability Engineering 25
TestMaster Model
Center for Reliability Engineering 26
Test Script Examplewin_activate ("mmount-76.umd.edu - CRT");start_time1= get_time();type ("1<kReturn>");Check_Message(Message_b,1);type ("0<kReturn>");Check_Message(Message_c,1);type ("155721495<kReturn>");Check_Message(Message_b,1);type ("0<kReturn>");Check_Message(Message_c,1);type("GayyardLupieN<kReturn>");Check_Message(Message_b,1);type("1<kReturn>");end_time1=get_time();report_msg("Cardtime is "&(end_time1-start_time1)"Seconds");start_time2= get_time();Check_Message(Message_d,1);wait( 9);type("4");Check_Message(Message_e, 1);wait(3);type("5");Check_Message(Message_f, 1);wait(1);type("1");Check_Message(Message_g, 1);wait(3);type("9");end_time2=get_time();report_msg("PINtime is "&(end_time2-start_time2)"Seconds");Case_Judge(Message_a,1);
Center for Reliability Engineering 27
Test Profile
Test Profile for PACS
No. Description of the Event Probability from input fault tree
1 Entering a good card: A good card is the card that has the card data in the correct format and has a data that is in the database. In other words this event reflects the number if times a genuine card is being entered in the system.
0.97= 1-[Pr(OP2)+Pr(OP3)+Pr(OP4)+Pr(OP5)]
V 2 Entering a bad card in terms of failure mode: Value 0.0075 = Pr(OP2)
A 3 Entering a bad card in terms of failure mode: Amount 0.0075 = Pr(OP3)
Ty 4 Entering a bad card in terms of failure mode: Type 0.0075 = Pr(OP4) S
W1
(inp
ut tr
ee)
Rg 5 Entering a bad card in terms of failure mode: Range 0.0075 = Pr(OP5)
6 Entering a good PIN: A good PIN is the event that reflects that the four digits of the PIN are correct and match the entry in the database.
0.8 = Pr()
7 Entry of the 1st digit within time: the allowed time for the entry if the first digits of the PIN is 10 seconds.
0.9091 = Pr()
8 Entry of subsequent digits of PIN within time: the allowed time is 5 seconds 0.833 = Pr()
V 9 Entering a bad PIN in terms of failure mode: Value 0.2 = Pr()
10 The time of inserting the first PIN is more than 10 seconds 0.101 = Pr() SW
2 (I
nput
tree
)
T
11 The time of inserting the second, third and fourth PIN is more than 5 seconds 0.167 = Pr()
Center for Reliability Engineering 28
Failure modes application
Characteristics Failure modes implementation Amount (A) A1 Value (V) V*(110%) Type (T) Integer to character or floating-
point, Character to integer, floating-point
Range (Rg) Vmin*(1 10%), Vmax*(1
10%) Time (T) T* (1
10%) Rate (R) R*(110%) Duration (D) D*(110%) Load (L) L*(1+10%)
• Test Case Selection1. Sample from the profile/input tree to see whether we have a “Normal” or
an “Abnormal Input”.2. If it is a normal input, select randomly from the “Normal Input” domain.3. If it is an abnormal input, randomly select the failure mode according to
the profile/input tree.4. Then randomly select the “base”value from the “Normal Input” domain
and mutate this “base” value using the rules given below:
Center for Reliability Engineering 29
Testing Results
200 cases have been tested for SW1 and SW2. 19 cases failed.
SW1 fails in only one case (58). Therefore, the point-estimate probability of Card failure is 1/200=0.005.
18 cases failed for SW2. Therefore, the unsafe probability (gate closed) is 18/199 =0.09.
Failed cases classification
Failure modes Failed cases Amount 58 (1) Time 12,13,18,29,56,99,159,162,166,183,196 (11) Function 10,108,129,148,154,163,171 (7)
Center for Reliability Engineering 30
Testing Results
Card Time Distribution
00.10.20.30.40.50.60.70.80.9
1
5 10
Time (seconds)
Pro
bab
ilit
yCard time and Probability
Time (seconds) Number of cases Probability 1-5 191 0.946 5-10 8 0.040
Center for Reliability Engineering 31
Testing Results
PIN Time Distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
10 20 30 40 50 60
Time (seconds)
Pro
ba
bili
ty
PIN time and Probability
Time (seconds) Number of cases Probability 1-10 18 0.099 10-20 79 0.436 20-30 48 0.265 30-40 19 0.104 40-50 14 0.077 50-60 3 0.016
Center for Reliability Engineering 32
Test Cases Coverage
Input Failure Modes Coverage (SW1)
Input Failure Modes Coverage (SW2)
Input failure modes of SW1 Cases Value 187 (1) Amount 22, 31, 58,146 (4) Type 58, 135,172,174 (4) Range 59 (1)
Input failure modes of SW2
Cases
Value 10, 13, 18, 22, 23, 56, 67, 69, 80, 89, 99, 126, 136, 142, 146, 162, 163, 166, 173, 196 (20)
Time 6, 10,12, 13, 14, 18, 23, 24, 29, 36, 40,42,49, 56, 57, 58, 61, 64, 65, 69, 72, 74, 75, 76, 80, 84, 85, 87, 89, 94, 95, 96, 99, 102, 103, 108, 113, 114, 115, 116,118,126, 128,129, 130, 133,136, 137, 145, 146, 147, 148, 149, 151, 154, 156, 157, 159, 160, 162, 163, 165, 166, 167, 169, 171, 172, 177, 183, 184, 186, 189, 191, 194, 195, 196, 197, 199 (78)
Center for Reliability Engineering 33
ESD
EmergencyExit
Can SW1 produce theoutput that can lead to
the PIN input?LED1
Yes
No
Yes
Safe
Safe
Loss oflife
Safe
No
Yes
FireProtection
YesSafe
No
Loss oflife
Guardthere
Guardaction
T1< TcriticalYes
No
Loss oflife
Delay
Can the output of SW3 cause the gate to
open?Gate Delay of
opening of gate T2< Tcritical
Yes Yes Yes
No No No No
Delay ofexecution
Does SW2 produce theoutput that can lead to the
opening of the gate?Gate Delay of opening
of gate T2< TcriticalT3< Tcritical
Yes Yes
NoNo No
Fire
No
Sequence 1
Sequence 2
Sequence 3
Sequence 4
Sequence 5-6
Sequence 7
Sequence 8-11
The userinsert the
card
Input 1 Input 2
Input 3
P1 P2
P3
Yes
YesDelay ofExecution
Delay
The userinsert the PIN
Delay ofExecution
Safe
Loss oflife
Does SW2 produces theoutput that can lead to the
opening of the gate?Gate
Delay of openingof gate T2< T
criticalT3< T
critical
Yes Yes
NoNo No
Sequence 12
Sequence 13-15
Input 2 P2 YesThe userinsert the PIN
Delay ofExecution
Delay
Delay
1
0
0.005
0.995
0.09
0.91
0.91
0.09
Center for Reliability Engineering 34
Future Work
• Represent hardware-related input failure modes in test model
• Quantification of input fault tree based on field data
• Output failure modes/Support failure modes• Sensitivity Analysis• Scalability
- Test case generation
- Test case execution
- Number of test cases for each software component
Recommended