34
Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Embed Size (px)

Citation preview

Page 1: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Systems 3Hardware/Software

T 79.232

Ilkka Herttua

Page 2: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Current situation / critical systems

• Based on the data on recent failures of critical systems, the following can be concluded:

a) Failures become more and more distributed and often nation-wide (e.g. commercial systems like credit card denial of authorisation)

b) The source of failure is more rarely in hardware (physical faults), and more frequently in system design or end-user operation / interaction (software).

c) The harm caused by failures is mostly economical, but sometimes health and safety concerns are also involved.

d) Failures can impact many different aspects of dependability (dependability = ability to deliver service that can justifiably be trusted).

Page 3: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Examples of computer failures in critical systems

Page 4: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Driving force: federation

• Safety-related systems have traditionally been based on the idea of federation. This means, a failure of any equipment should be confined, and should not cause the collapse of the entire system.

• When computers were introduced to safety-critical systems, the principle of federation was in most cases kept in force.

• Applying federation means that Boeing 757 / 767 flight management control system has 80 distinct microprocessors (300, if redundancy is taken into account). Although having this number of microprocessors is no longer too expensive, there are other problems caused by the principle of federation.

Page 5: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Hardware Faults

Intermittent faults- Fault occurs and recurrs over time (loose connector)Transient faults- Fault occurs and may not recurr (lightning)- Electromagnetic interferencePermanent faults- Fault persists / physical processor failure (design fault – over current)

Page 6: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

• Fault tolerance hardware- Achieved mainly by redundancy Redundancy- Adds cost, weight, power consumption, complexityOther means:- Improved maintenance, single system with better materials (higher MTBF)

Fault Tolerance

Page 7: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Redundancy types

Active Redundancy:

- Redundant units are always operating.

Dynamic Redundancy (standby):

- Failure has to be detected

- Changeover to other modul

Page 8: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Hardware redundancy techniques

Active techniques:

- Parallel (k of N)

- Voting (majority/simple)

Standby :

- Operating - hot stand by

- Non-operating – cold stand by

Page 9: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Reliability prediction

• Electronic Component- Based on propability and statictical- MIL-Handbook 217 – experimental data on actual device behaviour- Manufacture information and allocated circuit types-Bath tube curve; burn in – useful life – wear out

Page 10: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Reliability calculation for system

MTTF Mean time to failure- average time for which system would operate before first failure

• MTTR Mean time to repair – time to get system back in service again

• MTBF Mean time between failures

MTBF= MTTF+MTTR

Page 11: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Hardware

Fault Detection:- Routines to check that hardware works- Signal comparisons - Information redundancy –parity check etc..- Watchdog timers- Bus monitoring – check that processor alive- Power monitoring

Page 12: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Hardware

Possible hardware:COTS Microprocessors- No safety firmware, least assurance- Redundancy makes better, but common failures possible- Fabrication failures, microcode and documentation errors- Use components which have history and statistics.

Page 13: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Hardware

Specialist Microprocessors- Collins Avionics/Rockwell AAMP2- Used in Boeing 747-400 (30+ pieces)- High cost – bench testing, documentation, formal verification- Other models: SparcV7, TSC695E, ERC32 (ESA radiation-tolerant), 68HC908GP32 (airbag)

Page 14: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Hardware

Programmable Logic Controllers PLC• Contains power supply, interface and one or more processors.• Designed for high MTBFs• Firmware • Programm stored in EEPROMS• Programmed with ladder or function block diagrams

Page 15: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Correct Program:- Normally iteration is needed to develop a working solution. (writing code, testing and modification).- In non-critical environment code is accepted, when tests are passed.- Testing is not enough for safety-critical application – Needs an assessment process: dynamic/static testing, simulation, code analysis and formal verification.

Page 16: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Dependable Software :

- Process for development

- Work discipline

- Well documented

- Quality management

- Validated/verificated

Page 17: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Safety-Critical Programming Language:-Logical soundness: Unambigous definition of the language- no dialects of C++ - Simple definition: Complexity can lead to errors in compliers or other support tools- Expressive power: Language shall support to express domain features efficiently and easily- Security of definition: Violations of the language definition shall be detected- Verification: Language supports verification, proving that the produced code is consistent with the specification. - Memory/time constrains: Stack, register and memory usage are controlled.

Page 18: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Software faults:- Requirements defects: failure of software requirements to specify the environment in which the software will be used or unambigious requirements- Design defects: not satisfying the requirements or documentation defects- Code defects: Failure of code to conform to software designs.

Page 19: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Software faults:- Subprogram effects: Definition of a called variable may be changed. -Definitions aliasing: Names refer to the same storage location.- Initialising failures: Variables are used before assigned values.- Memory management: Buffer, stack and memory overflows- Expression evalution errors: Divide-by-zero/arithmetic overflow

Page 20: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Language comparison:-Structured assembler (wild jumps, exhaustion of memory, well understood)- Ada (wild jumps, data typing, exception handling, separate compilation)- Subset languages: CORAL, SPADE and Ada (Alsys CSMART Ada kernel)- Validated compilers for Pascal and Ada- Available expertise: with common languages higher productivity and fewer mistakes, but C still not appropriate.

Page 21: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua
Page 22: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Languages used :- Boeing uses mostly Ada, but still for type 747-400 about 75 languages used.- ESA mandated Ada for mission critical systems.- NASA Space station in Ada, some systems with C and Assembler.- Car ABS systems with Assembler- Train control systems with Ada- Medical systems with Ada and Assembler- Nuclear Reactors core and shut down system with Assembler, migrating to Ada.

Page 23: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Tools- High reliability and validated tools are required: Faults in the tool can result in faults in the safety critical software.- Widespread tools are better tested- Use confirmed process of the usage of the tool- Analyse output of the tool: static analysis of the object code- Use alternative products and compare results- Use different tools (diversity) to reduce the likelihood of wrong test results.

Page 24: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Designing Principles- Use hardware interlocks before computer/software - New software features add complexity, try to keep software simple - Plan for avoiding human error – unambigious human-computer interface- Removal of hazardous module (Ariane 5 unused code)

Page 25: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Designing Principles- Add barriers: hard/software locks for critical parts- Minimise single point failures: increase safety margins, exploit redundancy and allow recovery.- Isolate failures: don‘t let things get worse.- Fail-safe: panic shut-downs, watchdog code- Avoid common mode failures: Use diversity – different programmers, n-version programming

Page 26: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Designing Principles:

- Fault tolerance: Recovery blocks – if one module fails, execute alternative module.

- Don‘t relay on run-time systems

Page 27: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Techniques/Tools:

-Fault prevention: Preventing the introduction or occurence of faults by using design supporting tools (UML with CASE tool)

-Fault removal: Testing, debugging and code modification

Page 28: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Software faults:- Faults in software tools (development/modelling) can results in system faults.-Techniques for software development (language/design notation) can have a great impact on the performance od the people involved and also determine the likelihiid of faults.- The characteristics of the programming systems and their runtime determine how great the impact of possible faults on the overall software subsystem can be.

Page 29: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Architectural design:

Layered structure

1 - High level command and control functions

2 – Intermediate level routines

3 – I/O routines and device driver

Page 30: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Architectural design:

- Design is done after partitioning of the required functions on hardware and software.

- Complete specification of the architecture with components, data structures and interfaces (messages/protocols)

Page 31: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software Architectural design:- Test plan for each module (testability)- Human-computer interface- Change control system needed for inconsistencies and inadequacies within specification.- Verification of the architectural design against specification- Software partitioning: modular aids comprehension and isolation (fault limiting)

Page 32: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Reduction of Hazardous Conditions -summary- Simplify: Code contains only minimum features and no unnecessary or undocumented features or unused executable code- Diversity: Data and control redundancy - Multi-version programming: shared specification leads to common-mode failures, but synchronisation code increases complexity

Page 33: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Safety-Critical Software

Home assignments 3 :

- 6.42 (fault-tolerant system)- 7.15 (reliability model)- 9.17 (reuse of software)

Please email to [email protected] by 24 of February 2004

Page 34: Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

Home assignments 1&2• 1.12 (primary, functional and indirect safety)• 2.4 (unavailability)• 3.23 (fault tree)• 4.18 (tolerable risk)• 5.10 (incompleteness within specification)Email before 24. February to [email protected]

11 and 18 February Case Studies/ Teemu Tynjälä