Upload
archie
View
39
Download
1
Embed Size (px)
DESCRIPTION
Critical systems Lecture 6. Critical systems Critical systems specification Critical systems development Critical systems validation. This lecture is based on Chapters 3, 9, 20, 24. 1. Critical systems. Safety-critical systems - PowerPoint PPT Presentation
Citation preview
Critical systems
Lecture 6
Critical systems Critical systems specification Critical systems development Critical systems validation
1
This lecture is based on Chapters 3, 9, 20, 24
2
Critical systems Safety-critical systems
a failure may result in injury, loss of life or major environmental damage.
Chemical manufacturing plant, aircraft Mission-critical systems
a failure may result in the failure of some goal-directed activity navigational system for a spacecraft
Business-critical systems a failure may result in the failure of the business using that
system customer account system in a bank
Critical systems are the systems where failures can result in significant economic losses, physical damage or threats to human life.
High severity of these failures
High cost of these failures
DependabilityDependability
Availability Reliability Security
The ability of the systemto deliver services when
requested
The ability of the systemto deliver services as
specified
The ability of the systemto operate withoutcatastrophic failure
The ability of the systemto protect itelf againstaccidental or deliberate
intrusion
Safety
probability probability judgement judgement
Dependability = Trustworthiness
A dependable system is trusted by its users. Trustworthiness essentially means the degree of user confidence that the system will operate as they expect and that the
system will not 'fail' in normal use. Not dependable, Very dependable, Ultra dependable
Insulin pump data-flow
The system does not need to implement all dependability attributes.
It must be: Available : to deliver insuline Reliable: to deliver the correct amount of insuline Safe: it should not cause excessive doses of insuline
It must not be Secure: It is not exposed to external attacks.
The insulin pump system monitors blood sugar levels and delivers an appropriate dose of insulin
Needleassembly
Sensor
Display1 Display2
Alarm
Pump Clock
Controller
Power supply
Insulin reservoir
Insulinr equirementcomputation
Blood sugaranalysis
Blood sugarlevel
Insulindeliv erycontroller
Insulindosage
Blood
Bloodpar ameters
Blood sugarlevel
Insulin
Pump controlcommands Insulin
r equirement
Failures may occur in: • hardware• software• human mistakes
Other dependability properties Repairability
Reflects the extent to which the system can be repaired in the event of a failure.
Maintainability Reflects the extent to which the system can be
adapted to new requirements. Survivability
Reflects the extent to which the system can deliver services whilst under hostile attack.
Error tolerance Reflects the extent to which user input errors
can be avoided and tolerated.
Increasing dependability
Increasing the dependability of a system costs a lot. High levels of dependability can only be achieved at the
expense of system performance. Extra & redundant code for checking states and recovery.
Reasons for prioritising dependability: Systems that are unreliable, unsafe or insecure are often unused. System failure costs may be enormous. It is difficult to retrofit dependability.
Low Medium High Veryhigh
Ultra-high
Dependability
Due to additional design, implementation and validation
Cos
t
Availability and reliability
7
Reliability The ability of a system or component to perform its required function under stated conditions for a specified period of time.
Availability The degree to which a system or a component is operational and accessible when required for use.
System A System B
•It fails once a year•It takes one week to repair it
•It fails once a month•It takes 5 minutes to repair it
Some users may tolerate frequent failures as long as the system may recover quickly from these failures
Rel
iabi
lity
System failure An event that occurs at some point in time when the system does not deliver a service as expected by its users.
System error Erroneous system behaviour where the behaviour of the system does not conform to its specification.
System fault An incorrect system state, i.e., a system state that is unexpected by the designers of the system.
Human error Human behaviour that results in the introduction of faults or mistake into a system
System failure An event that occurs at some point in time when the system does not deliver a service as expected by its users.
System error Erroneous system behaviour where the behaviour of the system does not conform to its specification.
System fault An incorrect system state, i.e., a system state that is unexpected by the designers of the system.
Human error Human behaviour that results in the introduction of faults or mistake into a system
Causes by cause
Fault
CauseCause IDCause type
NonSoftware
Cause
OtherDefect
Problem(Report)
Report IDTitle
DynamicProblem (Failure)
StaticProblem
Defect
Other definition
Sommerville’s definition
Approaches to improve reliability Fault avoidance
Techniques are used that either minimise the possibility of mistakes and/or trap mistakes/faults before these result in the introduction of system faults (use of static analysis to detect faults).
Fault detection and removal Verification and validation techniques increasing the chances to
detect and remove faults before the system is used (systematic testing).
Fault tolerance Techniques ensuring that faults in a system do not result in
system errors or that system errors do not result in system failures (use of self-checking facilities, redundant system modules).
Input/output mapping
The reliability of the system is the probability that a particular input will lie in the set of inputs that cause erroneous outputs
IeInput set
OeOutput set
Program
Inputs causingerroneous outputs
Erroneousoutputs
Different people will use the system in different ways so User 2 will encounter system failure.
Possibleinputs
User1
User3
User2
Erroneousinputs
Removing X% of the faults in a system will not necessarily improve the reliability by X%. At IBM: removing 60% of
product defects resulted in a 3% improvement in reliability
Program defects may be in rarely executed sections of the code so may never be encountered by users. Removing these does not
affect the perceived reliability A program with known faults
may therefore still be seen as reliable by its users
Low Medium High Veryhigh
Ultra-high
Dependability
Due to additional design, implementation and validation
Cos
t
Safety models
Safety is concerned with ensuring that the system
cannot cause damage. Types of systems
Primary safety critical software Software malfunctioning causes hardware malfuction resulting in
human injury or environmental damage Secondary safety critical software
Software malfunctioning results in design defects, which in turn pose a threat to humans and environment.
11
Specification defects If the system specification is incorrect then the
system can behave as specified but still cause an accident.
Hardware failures may cause the system to behave in an unpredictable way Hard to anticipate in the specification
Correct individual operator inputs may lead to system malfunction in specific contexts Often the result of operator mistake
Unsafe reliable systems
Safety terms
13
Accident An unplanned event or sequence of events which results in (or mishap) human death or injury, damage to property or to the
environment.
A computer-controlled machine injuring its operator is an example of an accident.
Hazard A condition with the potential for causing or contributing to an accident.
A failure of the sensor that detects an obstacle in front of a machine is an example of a hazard.
Damage A measure of the loss resulting from a mishap.
Damage can range from many people killed as a result of an accident to minor injury or property damage.
Accident An unplanned event or sequence of events which results in (or mishap) human death or injury, damage to property or to the
environment.
A computer-controlled machine injuring its operator is an example of an accident.
Hazard A condition with the potential for causing or contributing to an accident.
A failure of the sensor that detects an obstacle in front of a machine is an example of a hazard.
Damage A measure of the loss resulting from a mishap.
Damage can range from many people killed as a result of an accident to minor injury or property damage.
Saf
ety
term
s
14
Hazards severity An assessment of the worst possible damage which could result from a particular hazard.
Hazard severity can range from catastrophic where many people are killed to minor where only minor damage results.
Hazard The probability of the events occurring which create a probability hazard.
Probability values tent to be arbitrary but range from probable (say 1/100 chance of a hazard occurring) to implausible (no conceivable situations are likely where the hazard could occur.
Risk This is a measure of the probability that the system will cause an accident.
The risk is assessed by considering the hazard probability, the hazard severity and the probability that the hazard will result in an accident.
Hazards severity An assessment of the worst possible damage which could result from a particular hazard.
Hazard severity can range from catastrophic where many people are killed to minor where only minor damage results.
Hazard The probability of the events occurring which create a probability hazard.
Probability values tent to be arbitrary but range from probable (say 1/100 chance of a hazard occurring) to implausible (no conceivable situations are likely where the hazard could occur.
Risk This is a measure of the probability that the system will cause an accident.
The risk is assessed by considering the hazard probability, the hazard severity and the probability that the hazard will result in an accident.
Assuring safety Assuring safety is to ensure either that accidents do not
occur or that the consequences of an accident are minimal. Hazard avoidance
The system is designed so that some classes of hazard simply cannot arise.
a cutting machine avoids the hazard of the operator’s hands being in the blade pathway
Hazard detection and removal The system is designed so that hazards are detected and removed
before they result in an accident a chemical plant system detects excessive pressure and opens a relief
valve to reduce the pressure before an explosion occurs Damage limitation
The system includes protection features that minimise the damage that may result from an accident
Fire extinguishers in an aircraft engine
Bild 15
Most of the accidents are almost all due to a combination of malfunctions rather than single failures.
Sec
urit
y The security of a system is an assessment of the extent that the system protects itself from external attacks that may be accidental or deliberate virus attack unauthorised use of system services unauthorised modification of the system and its data.
Security is becoming increasingly important as systems are networked so that external access to the system through the Internet is possible
Security is an essential pre-requisite for availability, reliability and safety 16
Exposure Possible loss or harm in a computing system.Analogous to accident.
Vulnerability A weakness in a computer-based system that may be exploited to cause loss or harm.Analogous to hazard.
Attack An exploitation of a system vulnerability.
Threats Circumstances that have potential to cause loss or harm.
Control A protective measure that reduces a system vulnerability.
Exposure Possible loss or harm in a computing system.Analogous to accident.
Vulnerability A weakness in a computer-based system that may be exploited to cause loss or harm.Analogous to hazard.
Attack An exploitation of a system vulnerability.
Threats Circumstances that have potential to cause loss or harm.
Control A protective measure that reduces a system vulnerability.
Types of damage caused by external attack
17
For some types of critical system, security is the most important dimension of system dependability, for instance, systems managing confidential information.
Denial of service Normal services are unavailable or service
provision is significantly degraded Corruption of programs or data
The programs or data in the system may be modified in an unauthorised way
Disclosure of confidential information Information may be exposed to people who are
not authorised to read or use that information
Security assurance Vulnerability avoidance
The system is designed so that vulnerabilities do not occur. If there is no network connection then external attack is
impossible.
Attack detection and elimination The system is designed so that attacks on vulnerabilities
are detected and neutralised before they result in an exposure. Virus checkers find and remove viruses before they infect a
system.
Exposure limitation The system is designed so that the adverse
consequences of a successful attack are minimised. A backup policy allows damaged information to be restored.
Critical systems
Lecture 4
Critical systems Critical systems specification Critical systems development Critical systems validation
19
Critical systems specification
The specification for critical systems must be of high quality and accurately reflect the needs of users. Types of requirements:
System functional requirements define error checking, recovery facilities and other features
Non-functional requirements for availability and reliability
‘shall not’ requirements. For safety and security Sometimes decomposed into more specific functional requirements.
The system shall not allow users to modify access permissions on any files that they have not created (security)
The system shall not allow reverse thrust mode to be selected when the aircraft is in flight (safety)
Stages of risk-based analysis
Risk reduction assessment: Define how each risk must be eliminated or reduced when the system is designed.
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Risk identification: Identify potential risks that may arise.
Risk analysis and classification: Assess the seriousness of each risk.
Risk decomposition: Decompose risks to discover their potential root causes.
Applicable to any dependable attribute
Risk identification
Identify the risks faced by the critical system. In safety-critical systems, the risks are the hazards
that can lead to accidents. In security-critical systems, the risks are the potential
attacks on the system.
Insulin overdose (service failure). Insulin underdose (service failure). Power failure due to exhausted battery (electrical). Electrical interference with other medical equipment
(electrical). Poor sensor and actuator contact (physical). Parts of machine break off in body (physical). Infection caused by introduction of machine (biological). Allergic reaction to materials or insulin (biological).
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Unacceptable regionRisk cannot be tolerated
Risk tolera ted only ifrisk reduction is impractical
or grossly expensive
Acceptableregion
Negligible risk
ALARPregion
Risk analysis and classification
The process is concerned with understanding the likelihood that a risk will arise and the potential consequences if an accident or incident should occur.
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Acceptability level
Risk analysis and classification
Estimate the risk probability and the risk severity. The aim must be to identify risks that are likely to arise or that
have high severity.
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Risk decomposition, fault-tree technique
Incorrectsugar levelmeasured
Incorrectinsulin doseadministered
or
Correct dosedelivered atwrong time
Sensorfailure
or
Sugarcomputation
error
Timerfailure
Pumpsignals
incorrect
or
Insulincomputation
incorrect
Deliverysystemfailure
Arithmeticerror
or
Algorithmerror
Arithmeticerror
or
Algorithmerror
A deductive top-down technique concerned with discovering the root causes of risks in a particular system.
Put the risk or hazard at the root of the tree and identify the system states that could lead to that hazard.
Where appropriate, link these with ‘and’ or ‘or’ conditions.
A goal should be to minimise the number of single causes of system failure.
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Risk reduction assessment
The aim of this process is to identify dependability requirements that specify how the risks should be managed and ensure that accidents/incidents do not arise.
Risk reduction strategies Risk avoidance: the system is designed so that the
risk or hazard cannot arise Risk detection and removal: the system is designed
so that risks are detected and neutralised before they result in an accident.
Damage limitation: the system is designed so that the consequences of an accident are minimised.
Risk analysis andclassification
Risk reductionassessment
Riskassessment
Dependabilityrequirements
Riskdecomposition
Root causeanalysis
Riskdescription
Riskidentification
Safety specificationSR1: The system shall not deliver a single dose of insulin that is greater than a specified
maximum dose for a system user.SR2: The system shall not deliver a daily cumulative dose of insulin that is greater than a
specified maximum for a system user.SR3: The system shall include a hardware diagnostic facility that shall be executed at
least 4 times per hour.SR4: The system shall include an exception handler for all of the exceptions that are
identified in Table 3.SR5: The audible alarm shall be sounded when any hardware or software anomaly is
discovered and a diagnostic message as defined in Table 4 should be displayed.SR6: In the event of an alarm in the system, insulin delivery shall be suspended until theuser has reset the system and cleared the alarm.
The safety requirements of a system should be separately specified.
These requirements should be based on an analysis of the possible hazards and risks.
the system specification should be formulated so that the hazards are unlikely to result in an accident
IEC
615
08: t
he s
afet
y li
fe-c
ycle
Hazard and riskanalysis
Concept andscope definition
Validation O & M Installation
Planning Safety-relatedsystems
development
External riskreductionfacilities
Operation andmaintenance
Planning and development
Systemdecommissioning
Safety req.allocation
Safety req.derivation
Installation andcommissioning
Safetyvalidation
Security specification Has some similarities to safety specification
Not possible to specify security requirements quantitatively;
The requirements are often ‘shall not’ rather than ‘shall’
requirements.
Differences No well-defined notion of a security life cycle for security
management; No standards;
Generic threats rather than system specific hazards;
Mature security technology (encryption, etc.).
The dominance of a single supplier (Microsoft) means that
huge numbers of systems may be affected by security
failure.
The approach to security analysis is based around the assets to be protected and their value to an organisation.
The security specification process
Assetidentification
System assetlist
Threat analysis andrisk assessment
Security req.specification
Securityrequirements
Threat andrisk matrix
Securitytechnology
analysis
Technologyanalysis
Threatassignment
Asset andthreat
description
The assets (data and programs) and their required degree of protection are identified.
Possible security threats are identified and the risks associated with each of these threats is estimated.
Identified threats are related to the assets so that, for each identified asset, there is a list of associated threats.
Available security technologies and their applicability against the identified threats are assessed.
The security requirements are specified. Where appropriate, these will explicitly identify the security technologies that may be used to protect against different threats to the system.
Types of security requirement
Intrusion detection requirements How to detect attacks
Non-repudiation requirements A third party in a transaction
cannot deny its involvement Privacy requirements
How to maintain data privacy Security auditing requirements
How system use can be audited and checked
System maintenance security requirements How not to accidentally destroy
security
SEC1: All system users shall be identified using their library card number and personal password.
SEC2: Users privileges shall be assigned according to the class of user (student, staff, library staff).
Identification requirements
Should the system identify users before interacting with them
Authentication requirements How are users identified
Authorisation requirements The privileges and access
permissions of the users Immunity requirements
How to protect the system against threads
Integrity requirements How avoid data corruption
System reliability specification
Hardware reliability What is the probability of a hardware component failing
and how long does it take to repair that component? Software reliability
How likely is it that a software component will produce an incorrect output. Software failures are different from hardware failures in that software does not wear out. It can continue in operation even after an incorrect result has been produced.
Operator reliability How likely is it that the operator of a system will make
an error?
Functional reliability requirements
A predefined range for all values that are input by the operator shall be defined and the system shall check that all operator inputs fall within this predefined range.
The system must use N-version programming to implement the braking control system.
The system must be implemented in a safe subset of Ada and checked using static analysis.
Functional reliability requirements specify how failures may be avoided or tolerated.
An appropriate reliability metric should be chosen to specify the overall system reliability.
Non
-fun
ctio
nal
reli
abil
ity
spec
ific
atio
nMetric Explanation
POFOD Probability of failure on demand
The likelihood that the system will fail when a service request is made. A POFOD of 0.001 means that 1 out of a thousand service requests may result in failure.
ROCOF Rate of failure occurrence
The frequency of occurrence with which unexpected behaviour is likely to occur. A ROCOF of 2/100 means that 2 failures are likely to occur in each 100 operational time units. This metric is sometimes called the failure intensity.
MTTF Mean time to failure
The average time between observed system failures. An MTTF of 500 means that 1 failure can be expected every 500 time units.
AVAIL Availability
The probability that the system is available for use at a given time. Availability of 0.998 means that in every 1000 time units, the system is likely to be available for 998 of these.
Reliability measurements do NOT take the
consequences of failure into account.
Non-functional reliability requirements are expressed quantatively
Failure classification
The types of failure are system specific and the consequences of a system failure depend on the nature of that failure.
When specifying reliability, it is not just the number of system failures that matter but the consequences of these failures.
Transient Occurs only with certain inputs
Permanent Occurs with all inputs
Recoverable System can recover without operator intervention
Unrecoverable Operator intervention needed to recover from failure
Non-corrupting Failure does not corrupt system state or data
Corrupting Failure corrupts system state or data
Transient Occurs only with certain inputs
Permanent Occurs with all inputs
Recoverable System can recover without operator intervention
Unrecoverable Operator intervention needed to recover from failure
Non-corrupting Failure does not corrupt system state or data
Corrupting Failure corrupts system state or data
RepeatableOccurred only once
For each sub-system, analyse the consequences of possible system failures.
From the system failure analysis, partition failures into appropriate classes. Transient, permanent, recoverable, unrec., corruptive, non-
corruptive Severity
For each failure class identified, set out the reliability using an appropriate metric. PODOF = 0.002 for transient failures PODOF = 0.00002 for permanent failures
Identify functional reliability requirements to reduce the chances of critical failures.
Steps to a reliability specification
Critical systems
Lecture 4
Dependability Critical systems specification Critical systems development Critical system validation
37
Approaches to developing dependability software
Fault avoidance design and implementation process to minimise
faults Fault detection
V&V to discover and remove faults Fault tolerance
Detect unexpected behaviour and prevent system failure
Few
Number of residual errors
Many Very few
The cost of producing fault free software is very high. It is only cost-effective in exceptional situations. It is often cheaper to accept software faults and pay for their consequences than to expend resources on developing fault-free software.
A fault-tolerant system is a system that can continue in operation after some system faults have manifested themselves. The goal of fault tolerance is to ensure that system faults do not result in system failure.
cost
Techniques for developing fault-free software
Dependable software processes Quality management Formal specification Static verification Strong typing Safe programming Protected information
Information hiding and encapsulation
Dep
enda
ble
proc
esse
sDocumentable The process should have a defined process model that sets out
the activities in the process and the documentation that is to beproduced during these activities.
Standardised A comprehensive set of software development standards thatdefine how the software is to be produced and documentedshould be available.
Auditable The process should be understandable by people apart fromprocess participants who can check that process standards arebeing followed and make suggestions for process improvement.
Diverse The process should include redundant and diverse verificationand validation activities.
Robust The process should be able to recover from failures ofindividual process activities.
A software development process is well defined, repeatable and includes a spectrum of verification and validation activities (irrespective of the people involved in the process)
Process activities for fault avoidance and detection
Requirements inspections. Requirements management. Model checking.
Internal, external (dynamic and static models are consistent)
Design and code inspection. Static analysis. Test planning and management. Configuration management.
Programming constructs and techniques that contribute to fault avoidance and fault tolerance
Design for simplicity Exception handling Information hiding Minimise the use of unsafe programming constructs.
Floating-point numbersPointersDynamic memory allocationParallelismRecursion
InterruptsInheritanceAliasingUnbounded arraysDefault input processing
Some standards for safety-critical systems development completely prohibit the use of some of these constructs
Fault tolerance actions
Fault detection The system must detect that a fault (an incorrect system
state) has occurred or will occur. Damage assessment
The parts of the system state affected by the fault must be detected and assessed.
Fault recovery The system must restore its state to a known safe state.
Fault repair The system may be modified to prevent recurrence of the
fault. As many software faults are transitory, this is often unnecessary.
System action
Human and/or system action
Fault detection and damage assessment
Define constraints that must hold for all legal states. Check the state against these constraints.
Checksums are used for damage assessment in data transmission.
Redundant pointers can be used to check the integrity of data structures.
Watch dog timers can check for non-terminating processes. If no response after a certain time, a problem is assumed.
Preventative fault detectionThe fault detection mechanism is initiated before the state change is committed. If an erroneous state is detected, the change is not made.
Retrospective fault detection
The fault detection mechanism is initiated after the system state has been changed.
Backward recovery Restore the system state to a known safe state.
Forward recovery Apply repairs to a corrupted system state and
set the system state to the intended value.
Fault recovery and repair
Hardware fault tolerance
Depends on triple-modular redundancy (TMR). There are three replicated identical components
that receive the same input and whose outputs are compared.
If one output is different, it is ignored and component failure is assumed.
A2
A1
A3
Outputcomparator
N-version programming
Version 2
Version 1
Version 3
Outputcomparator
N-versions
Agreedresult
Faultmanager
Input
As in hardware systems, the output comparator is a simple piece of software that uses a voting mechanism to select the output.
Rec
over
y bl
ocks
The different system versions are designed and implemented by different teams.
Acceptancetest
Algorithm 2
Algorithm 1
Algorithm 3
Recovery blocks
Test forsuccess
Re-test
Retry
Re-test
Try algorithm1
Continue execution ifacceptance test succeeds
Signal exception if allalgorithms fail
Acceptance testfails – retry
Critical systems
Lecture 4
Dependability Critical systems specification Critical systems development Critical system validation
49
Validation of critical systems
The verification and validation for critical systems involves High costs of failure High cost of V&V
V & V costs take up more than 50% of the total system development costs.
Reliability validation
Reliability validation involves exercising the program to assess whether or not it has reached the required level of reliability.
Computeobservedreliability
Apply tests tosystem
Prepare testdata set
Identifyoperational
profiles
OP uncertainty High cost of test data generation
Statistical uncertainty
Operational profiles
An operational profile is a set of test data whose frequency matches the actual frequency of these inputs from ‘normal’ usage of the system.
It can be generated from real data collected from an existing system or (more often) depends on assumptions made about the pattern of usage of a system.
...
Number ofinputs
Input classes
If you are not sure of your operational profile, then you cannot be confident about the accuracy of your reliability measurements.
Rel
iabi
lity
pre
dict
ion
A reliability growth model is a mathematical model of the system reliability change as it is tested and faults are removed.
It is used as a means of reliability prediction by extrapolating from current data
Reliability
Requiredreliability
Fitted reliabilitymodel curve
Estimatedtime of reliability
achievement
Time
= Measured reliabilityROCOF
Equal-step reliability growth
Reliability(ROCOF)
t1 t2 t3 t4 t5Time
ROCOF: Rate of failure occurrence
The equal-step growth model is simple but it does not normally reflect reality.
Reliability does not necessarily increase with change as the change can introduce new faults.
The rate of reliability growth tends to slow down with time as frequently occurring faults are discovered and removed from the software.
Random-step reliability growth
A random-growth model where reliability changes fluctuate may be a more accurate reflection of real changes to reliability.
t1 t2 t3 t4 t5Time
Note different reliabilityimprovements
Fault repair adds new faultand decreases reliability(increases ROCOF)
Reliability(ROCOF)
ROCOF: Rate of failure occurrence
Rel
iabi
lity
pre
dict
ion
Benefits: Planning of testing Customer negotiations
Reliability
Requiredreliability
Fitted reliabilitymodel curve
Estimatedtime of reliability
achievement
Time
= Measured reliabilityROCOF
Safety assurance
Safety assurance is concerned with establishing a confidence level in the system. Quantitative measurement of safety is impossible.
Confidence in the safety of a system can vary from very low to very high.
Confidence is developed through: Past experience with the company developing the software; The use of dependable processes and process activities
geared to safety Extensive V & V including both static and dynamic validation
techniques.
Mandatory reviews for safety-critical systems, suggested by D. Parnas
Review for correct intended function. Review for maintainable, understandable
structure. Review to verify algorithm and data structure
design against specification. Review to check code consistency with
algorithm and data structure design. Review the adequacy of system test cases
Proof of correctness
currentDose = 0
currentDose = 0
if statement 2then branch
executed
currentDose =maxDose
currentDose =maxDose
if statement 2else branchexecuted
if statement 2not executed
currentDose >= minimumDose andcurrentDose <= maxDose
or
currentDose >maxDose
administerInsulin
Contradiction
Contradiction Contradiction
Pre-conditionfor unsafe state
Overdoseadministered
assign assign
Safety arguments are intended to show that the system cannot reach an unsafe state.
They are generally based on proof by contradiction Assume that an unsafe
state can be reached; Show that this is
contradicted by the program code.
execution paths
Process assurance Explicit attention should be paid to safety
during all stages of the software process. This means that specific safety assurance
activities must be included in the process. Hazard logging and monitoring The appointment of project safety
engineers. The extensive use of safety reviews. Formal certification of safety-critical
components. The use of a very detailed configuration
management system.
Bild 60
Assuring the quality of the system development process is important for all critical systems but it is particularly important for safety-critical systems
Hazard and riskanalysis
Concept andscope definition
Validation O & M Installation
Planning Safety-relatedsystems
development
External riskreductionfacilities
Operation andmaintenance
Planning and development
Systemdecommissioning
Safety req.allocation
Safety req.derivation
Installation andcommissioning
Safetyvalidation
Security assessment
Security assessment has something in common with safety assessment.
It is intended to demonstrate that the system cannot enter some state (an unsafe or an insecure state) rather than to demonstrate that the system can do something.
However, there are differences Safety problems are accidental; security problems
are deliberate; Security problems are more generic - many
systems suffer from the same problems; Safety problems are mostly related to the application domain
Security validation Experience-based validation
The system is reviewed and analysed against the types of attack that are known to the validation team.
Tool-based validation Various security tools such as password checkers are
used to analyse the system in operation. Tiger teams
A team is established whose goal is to breach the security of the system by simulating attacks on the system.
Formal verification The system is verified against a formal security
specification.
Saf
ety
and
depe
ndab
ilit
y ca
ses
Safety and dependability cases are structured documents that set out detailed arguments and evidence that a required level of safety or dependability has been achieved.
They are normally required by regulators before a system can be certified for operational use.
Development Delivery
External certification
body
A documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment.
Safety caseDocument
….
Safety caseDocument
….
Component Description
System description An overview of the system and a description of its critical components.
Safety requirements The safety requirements abstracted from the system requirements specification.
Hazard and risk analysis
Documents describing the hazards and risks that have been identified and the measures taken to reduce risk.
Design analysis A set of structured arguments that justify why the design is safe.
Verification and validation
A description of the V & V procedures used and, where appropriate, the test plans for the system. Results of system V &V.
Review reports Records of all design and safety reviews.
Team competences Evidence of the competence of all of the team involved in safety-related systems development and validation.
Process QA Records of the quality assurance processes carried out during system development.
Change management processes
Records of all changes proposed, actions taken and, where appropriate, justification of the safety of these changes.
Associated safety cases
References to other safety cases that may impact on this safety case.
currentDose = 0
currentDose = 0
if statement 2then branch
executed
currentDose =maxDose
currentDose =maxDose
if statement 2else branchexecuted
if statement 2not executed
currentDose >= minimumDose andcurrentDose <= maxDose
or
currentDose >maxDose
administerInsulin
Contradiction
Contradiction Contradiction
Pre-conditionfor unsafe state
Overdoseadministered
assign assign
Argument structure
EVIDENCE
EVIDENCE
EVIDENCE
<< ARGUMENT >> CLAIM
The maximum insulin dose will not exceed maxDose
Safety argument model
Test data
Static analysis report
•The safety argument shows that ….•In 400 tests, …..•The analysis results show that ….
A safety case or a safety claim
The maximum singledose computed bythe pump softwarewill not exceedmaxDose
maxDose is set upcorrectly when thepump is configured
maxDose is a safedose for the user ofthe insulin pump
The insulin pumpwill not deliver asingle dose of insulinthat is unsafe
In normaloperation, themaximum dosecomputed will notexceed maxDose
If the software fails,the maximum dosecomputed will notexceed maxDose
Claim hierarchyClaims may be organised into a hierarchy
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
ClaimClaim
ClaimClaim
ClaimClaim
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
Argument
ArgumentArgument
ArgumentArgument