99
CSC313 High Integrity Systems/CSCM13 Critical Systems CSC313/CSCM13 Sect. 0 1/ 99

CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

  • Upload
    others

  • View
    9

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

CSC313 High Integrity Systems/CSCM13 Critical Systems

CSC313/CSCM13 Sect. 0 1/ 99

Page 2: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

CSC313 High Integrity Systems/CSCM13 Critical Systems

Course NotesChapter 0: Introduction

Anton SetzerDept. of Computer Science, Swansea University

http://www.cs.swan.ac.uk/∼csetzer/lectures/critsys/current/index.html

December 8, 2016

CSC313/CSCM13 Sect. 0 2/ 99

Page 3: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 3/ 99

Page 4: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (a) 4/ 99

Page 5: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Definition

Definition: A::::::::critical

:::::::::system is a

I computer, electronic or electromechanical systemI the failure of which may have serious consequences, such as

I substantial financial losses,I substantial environmental damage,I injuries or death of human beings.

Notation: When defining something, what is defined is denoted by

:::::::green

:::::::colour

:::::and

:::::::curly

:::::::::::::underlining.

CSC313/CSCM13 Sect. 0 (a) 5/ 99

Page 6: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Three Kinds of Critical Systems.

I::::::::::::::::Safety-critical

::::::::::systems.

I Failure may cause injury or death to human beings or substantialenvironmental harm.

I Main topic of this module.

I:::::::::::::::::Mission-critical

:::::::::::systems.

I Failure may result in the failure of some goal-directed activity.

I:::::::::::::::::::Business-critical

::::::::::system.

I Failure may result in the failure of the business using that system.

CSC313/CSCM13 Sect. 0 (a) 6/ 99

Page 7: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Examples of Critical Systems

I Safety-CriticalI Medical Devices.I Aerospace

I Civil aviation.I Military aviation.I Manned space travel

I Chemical Industry.I Nuclear Power Stations.I Traffic control.

I Railway control system.I Air traffic control.I Road traffic control (esp. traffic lights).I Automotive control systems.

I Other military equipment.

CSC313/CSCM13 Sect. 0 (a) 7/ 99

Page 8: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Examples of Critical Systems (Cont.)

I Mission-criticalI Navigational system of a space probe.

CSC313/CSCM13 Sect. 0 (a) 8/ 99

Page 9: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Examples of Critical Systems (Cont.)

I Business criticalI Customer account system in a bank.I Online shopping cart.I Areas where secrecy is required.

I Defence.I Secret service.I Sensitive areas in companies.

I Areas where personal data are administered.I Police records.I Administration of data of customers.I Administration of student marks.

CSC313/CSCM13 Sect. 0 (a) 9/ 99

Page 10: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Failure of a Critical System

CSC313/CSCM13 Sect. 0 (a) 10/ 99

Page 11: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Primary vs. Secondary

There are 2 kinds of of safety critical software.I

:::::::::Primary

::::::::::::::::safety-critical

:::::::::::software.

I Software embedded as a controller in a system.I Malfunction causes hardware malfunction, which results directly in

human injury or environmental damage.

CSC313/CSCM13 Sect. 0 (a) 11/ 99

Page 12: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Primary vs. Secondary

::::::::::::Secondary

::::::::::::::::safety-critical

::::::::::::software.

I Software indirectly results in injury.I E.g. software tools used for developing safety critical systems.

I Malfunction might cause bugs in critical systems created using thosetools.

I Medical databases.I A doctor might make a mistake because of

I wrong data from such a database,I data temporarily not available from such such a database in case of an

emergency.

CSC313/CSCM13 Sect. 0 (a) 12/ 99

Page 13: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Plan

I Learning outcome:I Familiarity with issues surrounding safety-critical systems, including

I legal issues,I ethical issues,I hazard analysis,I techniques for specifying and producing high integrity software.

I Understanding of techniques for specifying and verifying high-integritysoftware.

I Familiarity with and experience in applying programming languagessuitable for developing high-integrity software for critical systems (e.g.SPARK Ada).

CSC313/CSCM13 Sect. 0 (a) 13/ 99

Page 14: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (a) Motivation and Plan

Plan

0. Introduction, overview.1. Programming languages for writing safety-critical software.2. SPARK Ada.3. Safety criteria.4. Hazard and risk analysis.5. Fault tolerance.6. The development cycle of safety-critical systems.7. Verification, validation, testing.

CSC313/CSCM13 Sect. 0 (a) 14/ 99

Page 15: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (b) Administrative Issues

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (b) 15/ 99

Page 16: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (b) Administrative Issues

Address

Dr. A. SetzerDept. of Computer ScienceSwansea UniversitySingleton ParkSA2 8PPUK

Room Room 952, Talbot Building

Tel. (01792) 513368

Fax. (01792) 295651

Email [email protected]

CSC313/CSCM13 Sect. 0 (b) 16/ 99

Page 17: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (b) Administrative Issues

Two Versions of this Module

I There are 2 Versions of this Module:I Level 3 Version:

CSC313 High Integrity Systems.I 70% exam.I 30% coursework (lab).

I Level M Version (MSc, MRes and 4th year):CSCM13 Critical Systems.

I 60% exam.I 40% coursework (lab)

I I usually refer to this module as “Critical Systems”.

CSC313/CSCM13 Sect. 0 (b) 17/ 99

Page 18: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (b) Administrative Issues

Home Page of the Module

I Slides and any material distributed in this lecture will be madeavailable on blackboard.

I There is as well a homepage on which some links occur and a publicaccessible version of the slides will be made available.

I The homepage is athttp://www.cs.swan.ac.uk/∼csetzer/lectures/

critsys/current/index.html

I Errors in the notes will be corrected on the slides continuously andnoted on the list of errata.

I The homepage contains as well additional material for each sectionof the module (not available on blackboard).

I The additional material is not required for the exam.Most of this is material which has been taught previously but hasbeen omitted in order to make this lecture more lightweight.

CSC313/CSCM13 Sect. 0 (b) 18/ 99

Page 19: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (b) Administrative Issues

Lab Sessions

I There will be lab sessions using SPARK Ada, taking place in theLinux Lab.

I SPARK Ada can be installed on Linux, Windows, and Macintosh.I The why3 system, which is sometimes useful, unfortunately can only be

installed under Linux, and Macintosh.I We have a setup which minimises the amount of knowledge of Linux

needed.

I

Please obtain a Linux password from the system administrators(email should be sufficient).

CSC313/CSCM13 Sect. 0 (b) 19/ 99

Page 20: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (c) 20/ 99

Page 21: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Asta Train Accident (January 5, 2000)

Report from November 6, 2000https://www.regjeringen.no/no/dokumenter/nou-2000-30/id143393/

CSC313/CSCM13 Sect. 0 (c) 21/ 99

Page 22: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

���

��� ������ ������������������������������� ������������ ��!"�����# $�%����������������"&'���(��#��)&��#*"���*�+-, . /$021435046 798$:47$0�;$046 <904=CSC313/CSCM13 Sect. 0 (c) 22/ 99

Page 23: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

���

��� ����� �������������������������! #"$ ������%�&��'%�!$(��� )*�����,+�!�-��� ����.�������������%��!����!������/(/ ���'�0���!1'�23$ ���./(��'��4!������$(!�/ ������5���!541�6��4798;: < =>: ?>@>@>A 8;=CBEDGF;: @>H>8;@>F;I>F;: J>F;<

CSC313/CSCM13 Sect. 0 (c) 23/ 99

Page 24: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

����������

��� ���� ������������� �����

�� ����� !��������� �"���#�

�� ����� !��������� �"���%$

�� ����� &�"������� �"�&'�(�� ����� &�"������� �"�&'�)

�� ����� &���*����� �"�,+-$

�� ����� !��������� �"��+.�

��/ ���� �0���"������� �"��+

1��2� �34�"� �"����56+

�0��� 5*78���"�����:92�����������" 0+

;�<2= ���>� �����"� �"�0��5

?@�A� �A��)

1�B0� ���"� �"����5�C

1�B�� ����� �"����5�D

�2 E��� �F E� = �>����� ���G��5H("C�I ��*� �� �� = �"�-�"� �"�0��5�)�D

1�B�� ����� �"����5KJ

10B�� �-�"� �"����56L

?@�A� �A��(

;�<2= ���>� �����"� �"�0��5

�0��� 5E78����������92�����������" G�

MN� � ���������"� ���0��5�C0O D

MP� � �>���2����� �"����5��

MP� � �����2�-��� �"����56+

LQ�����G�@���"�2� E�25K5 ��7.R

LQ�����G�@���"�2� E�25K5 ��7.R

S-T%U-V W�X*Y*Z\[

S-T�]�X�^H_

`.a�b,c6d egf\h�iAj.k

l a0mncAd opf\h*i2jFk

q b8j.h�c\rsjt`ucHr�c6i2a�v

1��2� �3w��� �"�G��5��

MN� � �"�����0�"� �����*5�L4O�J

C���x0��5��� ����"�"� �0�

y:z ��{�|/�P��}6~�����{��0�H� �0�������0��z ������| �0�@�w� �n����{��

CSC313/CSCM13 Sect. 0 (c) 24/ 99

Page 25: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

10 persons on boardlocal

75 persons on boardexpress

Road crossingRena

green red?

Rudstad

CSC313/CSCM13 Sect. 0 (c) 25/ 99

Page 26: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

RenaTrain 230275 passengers

RudstadTrain 236910 passengers

I Railway with one track only. Thereforecrossing of trains only at stations possible.

I According to timetable crossing of trains at Rudstad.

CSC313/CSCM13 Sect. 0 (c) 26/ 99

Page 27: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

RenaTrain 230275 passengers

RudstadTrain 236910 passengers

I Train 2302 is 21 minutes behind schedule.When reaching Rena, delay is reduced to 8 minutes.Leaves Rena after a stop with green exit signal 13:06:15,in order to cross 2369 at Rudstad.

CSC313/CSCM13 Sect. 0 (c) 27/ 99

Page 28: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

RenaTrain 230275 passengers

RudstadTrain 236910 passengers

I Train 2369 leaves after a brief stop Rudstad 13:06:17,2 seconds after train 2302 left and 3 minutes ahead of timetable,probably in order to cross 2302 at Rena.

CSC313/CSCM13 Sect. 0 (c) 28/ 99

Page 29: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

RenaTrain 230275 passengers

RudstadTrain 236910 passengers

I Local train shouldn’t have had green signal.

I 13:07:22 Alarm signalled to the rail traffic controller (no audiblesignal).

I Rail traffic controller sees alarm approx. 13:12.

CSC313/CSCM13 Sect. 0 (c) 29/ 99

Page 30: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Sequence of Events

RenaTrain 230275 passengers

RudstadTrain 236910 passengers

I Traffic controller couldn’t warn trains, because of use of mobiletelephones (the correct telephone number hadn’t been passed on tohim).

I Trains collide 13:12:35, 19 persons are killed.

CSC313/CSCM13 Sect. 0 (c) 30/ 99

Page 31: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

���

��� ������� ���������� ��� ��������� � ��������� � �������������������� ������! ��"#"����$&%(' %�)+*#%-, . ' . /0' 1 2�3 . 4(%(5

��� ������� ����67��� ��0 #�����98�! ��"���:;��� � +8� ��9���<��0� ����=����>�<��� � �������������������< �����0! ��"�"��#�$&%(' %�)+*#%-, . ' . /0' 1 2�3 . 4(%(5

CSC313/CSCM13 Sect. 0 (c) 31/ 99

Page 32: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Investigations

I No technical faults of the signals found.

I Train driver was not blinded by sun.I Four incidents of wrong signals with similar signalling systems

reported:I Exit signal green and turns suddenly red.

Traffic controller says, he didn’t give an exit permission.I Hanging green signal.I Distant signal green, main signal red, train drives over main signal, and

pulls back.Traffic controller surprised about the green signal.

CSC313/CSCM13 Sect. 0 (c) 32/ 99

Page 33: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Investigations

I 18 April 2000 (after the accident): Train has green exit signal.When looking again, notices that the signal has turned red.Traffic controller hasn’t given exit permission.

I Several safety-critical deficiencies in the software found (some knownbefore!)

I The software used was entirely replaced.

CSC313/CSCM13 Sect. 0 (c) 33/ 99

Page 34: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

SINTEF

I SINTEF (Foundation for Scientific and Industrial Research at theNorwegian Institute of Technology) found no mistake leading directlyto the accident.Conclusion of SINTEF: No indication of abnormal signal status.⇒ Mistake of train driver (died in the accident).(Human Error).

I Assessment of report by RailcertI Criticism: SINTEF was only looking for single cause faults, not for

multiple causes.

CSC313/CSCM13 Sect. 0 (c) 34/ 99

Page 35: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Conclusion

I It is possible that the train driver of the local train wasdriving against a red signal.

I The fact that he was stopping and left almost at the sametime as the other train and 3 minutes ahead of time, makes itlikely that he received an erroneous green exit signal due to somesoftware error.It could be that the software under certain circumstances when givingan entrance signal into a block, for a short moment gives the entrancesignal for the other side of the block.

CSC313/CSCM13 Sect. 0 (c) 35/ 99

Page 36: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (c) A case study of a safety-critical system failing

Conclusion (Cont.)

I Even if this particular accident was not due to a software error,apparently this software has several safety-critical errors.

I In the protocol of an extremely simple installation (Brunna, 10km from Uppsala), which was established 1957 and exists in this formin 40 installations in Sweden, a safety-critical error was foundwhen verifying it with a theorem prover formally 1997.

I Lots of other errors in the Swedish railway system werefound during formal verification.

I Lecturer is together with PhD/MRes students involved in an industrialproject on verification of railway systems.

CSC313/CSCM13 Sect. 0 (c) 36/ 99

Page 37: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (d) 37/ 99

Page 38: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Lessons to be Learned

A sequence of events has to happen in order for an accident to takeplace.

Preliminary Event

Intermediate Events

(Ameliorating, Propagating)

Accident

Trigger Event

CSC313/CSCM13 Sect. 0 (d) 38/ 99

Page 39: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Events Leading to an Accident

I:::::::::::::Preliminary

:::::::::events.

= events which influence the initiating event.without them the accident cannot advance to the next step (initiatingevent).In the main example:

I Express train is late. Therefore crossing of trains first moved fromRudstad to Rena.

I Delay of the express train reduced. Therefore crossing of trains movedback to Rudstad.

CSC313/CSCM13 Sect. 0 (d) 39/ 99

Page 40: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Lessons to be Learned (Cont.)

I::::::::::Initiating

:::::::event,

::::::::trigger

:::::::event.

Mechanism that causes the accident to occur.In the main example:

I Both trains leave their stations on crash course, maybe caused by bothtrains having green signals.

CSC313/CSCM13 Sect. 0 (d) 40/ 99

Page 41: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Events Leading to an Accident

I::::::::::::::Intermediate

:::::::::events.

Events that may propagate or ameliorate the accident.I Ameliorating events can prevent the accident or reduce its impact.I Propagating events have the opposite effect.

I When designing safety critical systems, one shouldI avoid triggering events

I if possible by using several independent safeguards,

I add additional safeguards, which prevent a triggering event fromcausing an accident or reduces its impact.

CSC313/CSCM13 Sect. 0 (d) 41/ 99

Page 42: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Analysis of Accidents

I Goal of investigations of accidents usually concentrates on legalaspects.

I Who is guilty for the accident?

I If one is interested in preventing accidents from happening, one hasto carry out a more thorough investigation.

I Often an accident happens under circumstances, in which an accidentwas bound to happen eventually.

I It doesn’t suffice to try to prevent the trigger event from happeningagain.

I Problem with safety culture might lead to another accident, but thatwill probably be triggered by something different.

CSC313/CSCM13 Sect. 0 (d) 42/ 99

Page 43: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Causal Factors

We consider a three-level model (Leveson [Le95], pp. 48 – 51) in orderto identify the real reasons behind accidents.

I Level 1: Mechanisms, Chain of events.I Described above.

I Level 2: Conditions, which allowed the events on level 1 to occur.I Level 3: Conditions and Constraints,

I that allowed the conditions on the second level to cause the events atthe first level, e.g.

I Technical and physical conditions.I Social dynamics and human actions.I Management system, organisational culture.I Governmental or socioeconomic policies and conditions.

CSC313/CSCM13 Sect. 0 (d) 43/ 99

Page 44: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Root Causes

I Problems found in a Level 3 analysis form the root causes.I

:::::Root

::::::::causes are weaknesses in general classes of accidents, which

contributed to the current accident but might affect future accidents.I If the problem behind a root cause is not fixed, almost inevitably an

accident will happen again.I Many examples, in which despite a thorough investigation the

root cause was not fixed, and the accident happened again.

CSC313/CSCM13 Sect. 0 (d) 44/ 99

Page 45: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Root Causes

I Example: DC-10 cargo-door saga.I Faulty closing of the cargo door caused

collapsing of the cabin floor.I One DC-10 crashed 1970, killing 346 people.

I DC-10 built by McDonnell-Douglas.

I As a consequence a fix to the cargo doors was applied.I The root cause, namely the collapsing of the cabin floor when the

cargo door opens, wasn’t fixed.I 1972, the cargo door latch system in a DC-10 failed, the

cabin floor collapsed, and only due to the extraordinary skillfulnessof the pilot the plane was not lost.

CSC313/CSCM13 Sect. 0 (d) 45/ 99

Page 46: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions

I Level 1 Mechanism: The driver left the station although the signalshould have been red.Caused by Level 2 Conditions:

I The local train might have had for short period a green light, caused bya software error.

CSC313/CSCM13 Sect. 0 (d) 46/ 99

Page 47: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions

I Level 1 Mechanism: The local train left early.Caused by Level 2 Conditions:

I Might have been caused by the train driver relying on his watch(which might go wrong), and not on an official clock.

CSC313/CSCM13 Sect. 0 (d) 47/ 99

Page 48: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions

I Level 1 mechanism: The local train drove over a possibly at thatmoment red light.Caused by Level 2 condition:

I There was no ATP (automatic train protection) installed, which stopstrains driving over a red light.

CSC313/CSCM13 Sect. 0 (d) 48/ 99

Page 49: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions (Cont.)

I Level 1 mechanism: The traffic controller didn’t see the controllight.Caused by Level 2 conditions:

I Control panel was badly designed.I A visual warning signal is not enough, in case the system detects a

possible collision of trains.

CSC313/CSCM13 Sect. 0 (d) 49/ 99

Page 50: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions (Cont.)

I Level 1 mechanism: The rail controller couldn’t warn the driver,since he didn’t know the mobile telephone number.Caused by level 2 conditions:

I Reliance on a mobile telephone network in a safety critical system.This is extremely careless:

I Mobile phones often fail.I The connections might be overcrowded.I Connection to mobiles might not work in certain areas of the railway

network.

I The procedure for passing on the mobile phonenumbers was badly managed.

CSC313/CSCM13 Sect. 0 (d) 50/ 99

Page 51: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 2 Conditions (Cont.)

I Level 1 mechanism: A severe fire broke out as a result of the accident.Caused by Level 2 condition:

I The fire safety of the train engines was not very good.

CSC313/CSCM13 Sect. 0 (d) 51/ 99

Page 52: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 3 Constraints and Conditions

I Cost-cutting precedes many accidents.I Difficult, to maintain such a small railway line.I Resulting cheap solutions might be dangerous.I The railway controller had too much to do and was overworked.

I Flaws in the software.I Software wasn’t written according to the highest standards.I Control of railway signals is a safety-critical system and should be

designed with high level of integrity.I Very difficult to write correct protocols for distributed algorithms.I Need for verified design of such software.

CSC313/CSCM13 Sect. 0 (d) 52/ 99

Page 53: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 3 Constraints and Conditions

Poor human-computer interface at the control panel.I Typical for lots of control rooms.

I Known problem for many nuclear power stations.I In the accident at the Three Mile Island nuclear power plant at

Harrisburg, (a loss-of-coolant accident which costed between 1 billionand 1.86 billion US-$)

I there were lots of problems in the design of the control panel that leadto human errors in the interpretation of the data during the accident;

I one of the key indicators relevant to the accident was on the back ofthe control panel.

CSC313/CSCM13 Sect. 0 (d) 53/ 99

Page 54: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 3 Constraints and Conditions

I Criticality of the design of human computer interfaces in control roomsis not yet sufficiently acknowledged.

I Example: Problems with the UK air traffic control system, which aresometimes difficult to read.

I In case of an accident, the operator will be blamed.

CSC313/CSCM13 Sect. 0 (d) 54/ 99

Page 55: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 3 Constraints and Conditions

I Overconfidence in ICT.I Otherwise one wouldn’t have used such a badly designed software.I Otherwise one wouldn’t have simply relied on the mobile phones – at

least a special agreement with the mobile phone companies shouldhave been set up.

CSC313/CSCM13 Sect. 0 (d) 55/ 99

Page 56: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Level 3 Constraints and Conditions

I Flaws in management practises.I No protocol for dealing with mobile phone numbers.I No mechanism for dealing with incidents.I Incidents had happened before, but were not investigated.

I A mechanism should have been established to thoroughly investigatethem.

I Most accidents are preceded by incidents, which are not taken seriouslyenough.

CSC313/CSCM13 Sect. 0 (d) 56/ 99

Page 57: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (d) Lessons to be learned

Lessons to be Learned

I Usually at the end of an investigation conclusion “human error”.

I Uncritical attitude towards software:The architecture of the software was investigated but no detailedsearch for a bug was done.

I Afterwards, concentration on trigger events, but not muchattention to preliminary and intermediate events – root cause oftennot fixed.

I Most failures of safety-critical systems were caused bymultiple failures.

I Incidents precede accidents.I If one doesn’t learn from incidents, eventually an accident will happen.

CSC313/CSCM13 Sect. 0 (d) 57/ 99

Page 58: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (e) Two Aspects of Critical Systems

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (e) 58/ 99

Page 59: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (e) Two Aspects of Critical Systems

(1) Software engineering Aspect

I Safety-critical systems are very complex –System aspect.

I Software, which includes parallelism.I Hardware.

I Might fail (light bulb of a signal might burn through, relays age).I Have to operate under adverse conditions (low temperatures, rain,

snow).

I Interaction with environment.I Human-computer interaction.I Protocols the operators have to follow.

CSC313/CSCM13 Sect. 0 (e) 59/ 99

Page 60: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (e) Two Aspects of Critical Systems

(1) Software engineering Aspect (Cont.)

I Training of operators.I Cultural habits.

I For dealing with this, we need to look at aspects likeI Methods for identifying hazards and measuring risk (HAZOP,

FMTA etc.)I Standards.I Documentation (requirements, specification etc.)I Validation and verification.I Ethical and legal aspects.

I Based on techniques used in other engineering disciplines(esp. chemical industry, nuclear power industry, aviation).

CSC313/CSCM13 Sect. 0 (e) 60/ 99

Page 61: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (e) Two Aspects of Critical Systems

(2) Tools for writing correct software

I Software bugs can not be avoided by careful design.I Especially with distributed algorithms.

I Need for verification techniques using formal methods.I Different levels of rigour:

(1) Application of formal methods by hand, without machineassistance.

(2) Use of formalised specification languages with somemechanised support tools.

(3) Use of fully formal specification languages withmachine assisted or fully automated theorem proving.

CSC313/CSCM13 Sect. 0 (e) 61/ 99

Page 62: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (e) Two Aspects of Critical Systems

(2) Tools for Writing Correct Software (Cont.)

I However, such methods don’t replace softwareengineering techniques.

I Formal methods idealise a system and ignore aspects likehardware failures.

I With formal methods one can show that some software fulfils its formalspecification.But checking that the specification is sufficient in order to guaranteesafety cannot be done using formal methods.

CSC313/CSCM13 Sect. 0 (e) 62/ 99

Page 63: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (f) 63/ 99

Page 64: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

I When investigating the Asta train accident, it seems thatrace conditions could have been the technical reason for thataccident.

I Race conditions occur in programs which have several threads.I An well-known studied example of problems with race conditions is

the Therac-25.I The Therac-25 was a computer-controlled radiation therapy machine,

which 1985 – 1987 overdosed massively 6 people.I There were many more incidents.

CSC313/CSCM13 Sect. 0 (f) 64/ 99

Page 65: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Therac 25

I Two of the threads in the Therac 25 were:I The keyboard controller, which controls data entered by the

operator on a terminal.I The treatment monitor, which controls the treatment.

I The data as received by the keyboard controller was passed on thetreatment monitor using shared variables.

CSC313/CSCM13 Sect. 0 (f) 65/ 99

Page 66: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Therac 25

Threads:- Keyboard Controller.- Treatment Monitor.

I It was possible thatI the operator enters data,I the keyboard controller signals to the treatment monitor that data

entry is complete,I the treatment monitor checks the data and starts preparing a

treatment,I the operator changes his data,I the treatment monitor never realises that the data have changed, and

therefore never checks them,I but the treatment monitor uses the changed data, even if they violate

the checking conditions.

CSC313/CSCM13 Sect. 0 (f) 66/ 99

Page 67: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Problems with Concurrency

I Complete story of the Therac-25 is rather complicated.I In general, we have here an example of having several threads, which

have to communicate with each other.I Note that the concurrency between the user interface and the main

thread might be ignored in a formal verification.⇒ Limitations of formal verification and need to take into account

the systems aspect.

CSC313/CSCM13 Sect. 0 (f) 67/ 99

Page 68: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

I Although in the Therac-25 case there was no problem of raceconditions in a strict sense (it was a lack of fully communicatingchanges by one thread to another thread), when designing systemswith concurrency one has to be aware of the possibility of raceconditions.

I Problems with race conditions are very common in criticalsystems, since most critical systems involve some degree ofconcurrency.

CSC313/CSCM13 Sect. 0 (f) 68/ 99

Page 69: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

I::::::Race

::::::::::::conditions occur if two threads share the same variable.

I Typical scenario is if we have an array Queue containing values fortasks to be performed, plus a variable next pointing to the the firstfree slot.The tasks could be for instance cases of emergencies, to which anambulance needs to be sent.

Queue : Task1 Task2 Task3 Free · · ·↑

next

CSC313/CSCM13 Sect. 0 (f) 69/ 99

Page 70: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

Queue : Task1 Task2 Task3 Free · · ·next

I Assume two threads:I Thread 1: line 1.1: Queue[next]= TaskThread1;

line 1.2: next := next + 1;I Thread 2: line 2.1: Queue[next]= TaskThread2;

line 2.2: next := next + 1;I Assume that after line 1.1, Thread 1 is interrupted, Thread 2 executes

line 2.1 and line 2.2, and then Thread 1 executes line 1.2.

Remark: When repeating text from a previous slide, we denote it inorange colour.

CSC313/CSCM13 Sect. 0 (f) 70/ 99

Page 71: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

Queue : Task1 Task2 Task3 Free · · ·next

Thread 1 line 1.1: Queue[next]= TaskThread1;

line 1.2: next := next+ 1;

Thread 2 line 2.1: Queue[next]= TaskThread2;

line 2.2: next := next + 1;

Execution: 1.1 → 2.1 → 2.2 → 1.2.

I Let next∼ be the value of next before this run and write next for itsvalue after this run.

I Result: Queue[next∼] = TaskThread2.Queue[next∼+1] = previous value of

Queue[next∼+1] (could be anything).next= next∼+ 2.

CSC313/CSCM13 Sect. 0 (f) 71/ 99

Page 72: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Race Conditions

Queue : Task1 Task2 Task3 Free · · ·next

Thread 1 line 1.1: Queue[next]= TaskThread1;

line 1.2: next := next + 1;

Thread 2 line 2.1: Queue[next]= TaskThread2;

line 2.2: next := next + 1;

I Problem: In most test-runs Thread 1 will not be interrupted byThread 2.

I It’s difficult to detect race conditions by tests.I It’s difficult to debug programs with race conditions.

CSC313/CSCM13 Sect. 0 (f) 72/ 99

Page 73: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Critical Regions

I Solution for the race conditions.I Form groups of shared variables between threads, such that variables in

different groups can be changed independently without affecting thecorrectness of the system, i.e. such that different groups are

:::::::::::orthogonal.

I In the example Queue and next must belong to the same group, sincenext cannot be changed independently of Queue.

I However, we might have a similar combination of variablesQueue2/next2, which is independent of Queue/next. ThenQueue2/next2 forms a second group.

CSC313/CSCM13 Sect. 0 (f) 73/ 99

Page 74: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Critical Regions

I Lines of code involving shared variables of one group are criticalregions for this group.

I A:::::::critical

:::::::region

::::for

::a

::::::group is a sequence of instructions for a

thread reading or modifying variables of this group, such that if duringthe execution of these instructions one of the variables of the samegroup is changed by another thread, then the system might reach astate which results in incorrect behaviour of the system.

I Critical regions for the same group of shared variables have to be

::::::::::mutually

:::::::::::exclusive.

CSC313/CSCM13 Sect. 0 (f) 74/ 99

Page 75: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Critical Regions

Queue : Task1 Task2 Task3 Free · · ·next

Thread 1 line 1.1: Queue[next]= TaskThread1;

line 1.2: next := next + 1;

Thread 2 line 2.1: Queue[next]= TaskThread2;

line 2.2: next := next + 1;

I Line 1.1/1.2 and line 2.1/2.2 are critical regions for this group.

I When starting line 1.1, thread 1 enters a critical region forQueue/next.

CSC313/CSCM13 Sect. 0 (f) 75/ 99

Page 76: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Critical Regions

Queue : Task1 Task2 Task3 Free · · ·next

Thread 1 line 1.1: Queue[next]= TaskThread1;

line 1.2: next := next + 1;

Thread 2 line 2.1: Queue[next]= TaskThread2;

line 2.2: next := next + 1;

I Before it has finished line 1.2, it is not allowed to switch to anythread with the same critical region, e.g. line 2.1/2.2 of thread 2.

I But it is not a problem to interrupt Thread 1 between line 1.1/1.2and to change variables, which are independent of Queue/next.

CSC313/CSCM13 Sect. 0 (f) 76/ 99

Page 77: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Synchronised

I In Java achieved by keyword synchronized.

I A method, or block of statements, can be synchronised w.r.t. anyobject.

I Two synchronised blocks of code/methods, which are synchronisedw.r.t. the same object, are mutually exclusive.

I But they can be interleaved with code of any other code, which is notsynchronised w.r.t. the same object.

I Non-static code, which is synchronised without a specifier, issynchronised w.r.t to the object it belongs to.

I Static code, which is synchronised without a specifier, is synchronisedw.r.t to the class it belongs to.

CSC313/CSCM13 Sect. 0 (f) 77/ 99

Page 78: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Synchronisation in the Example

Queue : Task1 Task2 Task3 Free · · ·next

The example above corrected using Java-like pseudo code is as follows:Thread 1 synchronized(Queue){

Queue.Queue[Queue.next]= TaskThread1;

Queue.next := Queue.next + 1;}Thread 2 synchronized(Queue){

Queue.Queue[Queue.next]= TaskThread2;

Queue.next := Queue.next + 1;}

CSC313/CSCM13 Sect. 0 (f) 78/ 99

Page 79: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Possible Scenario for Railways

I The following is a highly simplified scenario:Signal 1→ ← Signal 2

Train1 −→ Segment 1 ←− Train2

I Thread Controlling Signal 1 (triggered by train 1):

if (not segment1 is blocked){signal1 := green;

segment1 is blocked:= true;}I Thread Controlling Signal 2 (triggered by train 2):

if (not segment1 is blocked){signal2 := green;

segment1 is blocked:= true;}

CSC313/CSCM13 Sect. 0 (f) 79/ 99

Page 80: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Possible Scenario for Railways

Signal 1→ ← Signal 2

Train1 −→ Segment 1 ←− Train2

Signal1 l 1.1: if (not segment1 is blocked){l 1.2: signal1 := green;

l 1.3: segment1 is blocked:= true;}Signal2 l 2.1: if (not segment1 is blocked){

l 2.2: signal2 := green;

l 2.3: segment1 is blocked:= true;}I If not synchronised, one might execute

l 1.1 → l 2.1 → l 1.2 → l 1.3 → l 2.2 → l 2.3This sequence results in signal1 and signal2 being green.

CSC313/CSCM13 Sect. 0 (f) 80/ 99

Page 81: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (f) Race Conditions

Possible Scenario for Railways

I If one has a checker, it might recognise later the problem and repairit, but still for some time both trains have green light.

I The above scenario is of course too simple to be realistic, but one canimagine a more complicated variant of the above going on hidden inthe code.

I Problems of this nature could be one possible reason, why in thesystem used in Norway it sometimes happened that a signal switchedfor a short moment to green.

CSC313/CSCM13 Sect. 0 (f) 81/ 99

Page 82: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

0 (a) Motivation and Plan

0 (b) Administrative Issues

0 (c) A case study of a safety-critical system failing

0 (d) Lessons to be learned

0 (e) Two Aspects of Critical Systems

0 (f) Race Conditions

0 (g) Literature

CSC313/CSCM13 Sect. 0 (g) 82/ 99

Page 83: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Literature

I In general, the module is self-contained.

I In the following a list of books, which might be of interest, if you laterhave to study critical systems more intensively, or for your report.

CSC313/CSCM13 Sect. 0 (g) 83/ 99

Page 84: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Main Course Literature

I Main course book:[St96] Storey, Neil: Safety-critical computer systems.

Addison-Wesley, 1996.

I Additional Overview Texts[Ba97a] Bahr, Nicolas J.: System safety and risk

assessment: a practical approach.Taylor & Francis, 1997.Intended as a short book for engineers fromall engineering disciplines.(Second edition to appear 2015).

[So10] Part 2 in Sommerville, Ian: SoftwareEngineering. 9th Edition, Addison-Wesley, 2010.Concise and nice overview over softwareengineering aspects of safety-critical systems.

CSC313/CSCM13 Sect. 0 (g) 84/ 99

Page 85: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Copyright

I Since this lecture is heavily based on the book by Neil Storey [St96], alot of material in these slides is taken from that book.

CSC313/CSCM13 Sect. 0 (g) 85/ 99

Page 86: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Additional Overview Texts

[Le95] Nancy G. Leveson: Safeware. System safety and computers.Addison-Wesley, 1995.Concentrates mainly on human and sociological aspects. Moreadvanced book, but sometimes used in this module.

CSC313/CSCM13 Sect. 0 (g) 86/ 99

Page 87: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Preliminaries for Further Books

I The following book recommendations are more intended, if you needI literature for your reports,I need later literature when working in industry on critical systems.

I Gives as well an overview over various disciplines involved in this area.

CSC313/CSCM13 Sect. 0 (g) 87/ 99

Page 88: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Further General Books

[Ne95] Peter G. Neumann: Computer related risks. Addison-Wesley,1995.A report on a lot of (100?) accidents and incidents of criticalerrors in software.

[GM02] Geffroy, Jean-Claude; Motet, Gilles: Designof dependable computing systems. Kluwer, 2002.Advanced book on the design of general dependable computersystems.

[CF01] Crowe, Dana; Feindberg, Alec: Design for reliability.CRC Press, 2001.Book for electrical engineers, who want to designreliable systems.

CSC313/CSCM13 Sect. 0 (g) 88/ 99

Page 89: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on Reliability Engineering

I:::::::::::Reliability

:::::::::::::::engineering uses probabilistic methods in order to

determine the degree of reliability required for the components of asystem.

[Mu04] Musa, John: Software reliability engineering. McGraw-Hill,2nd edition, 2004.Bible of reliability engineering. Difficult book.

[Sm11] Smith, David J: Reliability, maintainability and risk. Butter-worth Heinemann Oxford. 8th Ed., 2011Intended for practitioners.

[Xi91] Xie, M. Software reliability modelling. World Scientific, 1991.Introduces various models for determining reliability of systems.

CSC313/CSCM13 Sect. 0 (g) 89/ 99

Page 90: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on Hazard Analysis

I HAZOP and FMEA are techniques for identifying hazards in systems.

[RCC99] Redmill, Felix; Chudleigh, Morris; Catmur, James:System safety: HAZOP and software HAZOP. John Wiley,1999.General text on HAZOP.

[MMB08] McDermott, Robin E.; Mikulak, Raymond, J.; Beauregard,Michael, R.: The basics of FMEA. CRC Press, 2nd Ed, 2008.Booklet on FMEA.

CSC313/CSCM13 Sect. 0 (g) 90/ 99

Page 91: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on Software Testing

I Software testing techniques.

[Pa05] Patton, Ron: Software testing. SAMS, 2nd Edition, 2005.Very practical book on techniques for software testing.

[DRP99] Dustin, Elfriede; Rashka, Jeff; Paul, John:Automated software testing. Addison-Wesley, 1999.Practical book on how to test software automatically.

CSC313/CSCM13 Sect. 0 (g) 91/ 99

Page 92: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

General Books on Formal Methods

I Overview Books.

[Pe01] Peled: Software Reliability Methods. Springer 2001.Overview over formal methods for designing reliable software.

[Mo03] Monin, Jean-Francois and Hinchey, M. G. Understandingformal methods. Springer, 2013.General introduction into formal methods.

CSC313/CSCM13 Sect. 0 (g) 92/ 99

Page 93: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Formal Methods Used in Industry

I Z. Industrial standard method for specifying software.

[Ja97] Jacky, Jonathan: The way of Z. Practical programming withformal methods. Cambridge University Press, 1997.Practical application of the specification language Z in medicalsoftware.

[Di94] Diller, Antoni: Z. An introduction to formal methods. 2ndEd., John Wiley, 1994.Introduction into Z.

[Li01] Lightfoot, David: Formal specification using Z. 2nd edition,Palgrave, 2001.

CSC313/CSCM13 Sect. 0 (g) 93/ 99

Page 94: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Formal Methods Used in Industry

I B-Method. Developed from Z. Becoming the standard for specifyingcritical systems.

[Sch01] Steve Schneider: The B-method. Palgrave, 2001.Introduction into the B-method.

[Ab96] Abrial, J.-R.: The B-Book. Cambridge University Press, 1996.The “bible” of the B-method, more for specialists.

I Event-B. Successor of B-Method by the developer of the B-method.However, many people in industry stay with B-method.

[Ab10] Abrial, J.-R.: Modeling in Event-B. System and SoftwareEngineering Cambridge University Press, 2010.

CSC313/CSCM13 Sect. 0 (g) 94/ 99

Page 95: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Formal Methods Used in Industry

I Model checking, used for automatic verification, especially ofhardware.

[CGP01] Clarke, Edmund M. Jr.; Grumberg, Orna; Peled, Doron A.:Model checking. MIT Press, 3rd printing 2001.Very thorough overview over this method.

[HR13] Michael R. A. Huth, Mark D. Ryan: Logic in computerscience. Modelling and reasoning about systems. Cam-bridge University Press, 2nd Ed., 2004.Introduction to logic for computer science students. Has a nicechapter on model checking.

CSC313/CSCM13 Sect. 0 (g) 95/ 99

Page 96: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on SPARK Ada

I SPARK-Ada is a subset of the programming language Ada, withproof annotations in order to develop secure software. Developed andused in industry.

I There is one new book on SPARK Ada 2014, the version used in thismodule:

[Mcc15] W. Mccormick: Building High Integrity Applications withSpark. Cambridge University Press September 2015

CSC313/CSCM13 Sect. 0 (g) 96/ 99

Page 97: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on SPARK Ada

The following 3 books are essentially 3 editions of the same book, theclassic text on SPARK Ada. They refer unfortunately only to pre-2014SPARK Ada. In this module we are teaching SPARK Ada 2014, which is asubstantial improvement of pre-2014 SPARK Ada.

[Ba12a] John Barnes: SPARK : the proven approach to high integritysoftware. Altran Praxis 2012Essentially 3rd Edition of Barnes’ SPARK Ada book

[Ba03] John Barnes: High integrity Software. The SPARK approachto safety and security. Addison-Wesley, 2003.Essentially 2nd Edition of Barnes’ SPARK Ada book

[Ba97b] John Barnes: High integrity Ada. The SPARK approach.Addison-Wesley, 1997.Essentially 1st Edition of Barnes’ SPARK Ada book

CSC313/CSCM13 Sect. 0 (g) 97/ 99

Page 98: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

User Guide and Reference Manual of SPARK Ada

I Documentation of SAPRK Ada (user guide, reference guide andmore) can be found athttp://www.adacore.com/developers/documentation

I A pdf version of the user guide is part of the distribution, located inthe linux lab at

/usr/gnat/share/doc/spark/pdf/spark2014_ug.pdf

If you chose to install SPARK Ada into /opt/spark2014/ then it islocated at

/opt/spark2014/share/doc/spark/pdf/spark2014_ug.pdfI A pdf of the reference guide is part of the distribution, located in the

linux lab at/usr/gnat/share/doc/spark/pdf/spark2014_rm.pdfIf you chose to install SPARK Ada into /opt/spark2014/ then it islocated at/opt/spark2014/share/doc/spark/pdf/spark2014_rm.pdfCSC313/CSCM13 Sect. 0 (g) 98/ 99

Page 99: CSC313 High Integrity Systems/CSCM13 Critical Systemscs.swan.ac.uk/~csetzer/lectures/critsys/16/critsysfinal0.pdf · 0 (a) Motivation and Plan 0 (b) Administrative Issues 0 (c) A

0 (g) Literature

Books on Ada

I Ada, the underlying language of SPARK-Ada

[Ba12b] John Barnes: Programming in Ada 2012. Cambridge Univer-sity Press 2014.Covers Ada 2012, which is the basis for Spark Ada 2014.

[Na95] David J. Naiditch: Rendezvous with Ada 95. John Wiley, 2ndEdition, 1995.Really good Ada book, but covers only Ada 95.

CSC313/CSCM13 Sect. 0 (g) 99/ 99