38
Tools and Techniques for Higher Reliability Software FOSDEM 2013 – Ada Developer Room Philippe Waroquiers Eurocontrol/DNM [email protected] 3 February 2013

Tools and Techniques for Higher Reliability Software

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tools and Techniques for Higher Reliability Software

Tools and Techniques forHigher Reliability Software

FOSDEM 2013 – Ada Developer Room

Philippe WaroquiersEurocontrol/DNM

[email protected]

3 February 2013

Page 2: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 2

Eurocontrol

European Organisation for the Safety of Air Navigation International organisation, 39 member states Multiple activities/directorates/…

Participates/supports big European projects Central Route Charge Office Directorate Network Management ….

More info: www.eurocontrol.int

Page 3: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 3

Directorate Network Management

Air Traffic Management “Network Management” = services of general interest for the European Aviation

European air route design Flight plan processing Flow management Scarce resources management

Radio frequencies SSR codes

Crisis management (remember 2010 volcano ash crisis)

Page 4: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 4

Flight plan processing, Flow management, …

Flight Plan processing over the whole of Europe (IFPS) Aircraft Operators send flight plans to IFPS Flight plans are verified, corrected if needed, redistributed to airspace

control centres, airports, Aircraft Operators

Flow Management (ETFMS) : Balancing demand and capacity Safety : avoid Air Traffic Control overload Efficiency : best use of ATC capacity, minimise delays

ENV: “airspace data management system” Data Ware house User interfaces for external users

Web Portal …

Page 5: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 5

2D Trajectory, alternate routes

Page 6: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 6

Vertical Trajectory

Page 7: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 7

Differences radar plots <> Plan

Page 8: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 8

Recomputed with radar plots

Page 9: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 9

Macroscopic view of Europe

Page 10: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 10

ETFMS & IFPS

Sophisticated systems around 2 million SLOC of Ada

Reliability requirements If IFPS down: no flight plan processing in Europe ! If ETFMS down: passengers will sleep in aerodromes ! Duplicated hardware, duplicated sites, contingency systems, …

Performances requirements ETFMS handles 3 millions messages per day

Sometimes implies complex processing (e.g. recompute a flight route)

Safety requirements Various obligations about people, procedures and systems

E.g. Software Assurance Level (SWAL)

Safety audits

Page 11: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 11

Better no critical bugs in critical systems …

Use of uninitialised data Memory leaks Dangling pointers Buffer overflows Race conditions Performance issues Memory use problems …

Page 12: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 12

But how to avoid/find/eliminate such bugs ?

People (qualifications, training, …) Procedures (code review, coding standards, …) Testing (unit testing, integration testing, user acceptance tests,

shadow operations, security audits, …)

But also TOOLS The Ada language is a main asset to avoid many such bugs

Thanks to early detection at compilation time Thanks to run-time checks showing bugs during early testing

Valgrind is a main asset to find and eliminate remaining bugs

Page 13: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 13

What is Valgrind ?

Valgrind = framework to build runtime analysis tools + a set of tools Framework = about 400 KSLOC Tools : between 3 KSLOC to 22 KSLOC

Tools: Memcheck Callgrind Helgrind Drd Massif Exp-sgcheck …

Page 14: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 14

Use of uninitialised data (1)

Memcheck --undef-value-errors=yes (default value) Will report an error if an undefined value use will change the behaviour

Ada language : pragma Normalize_Scalars All non-explicitly assigned scalars are automatically given a (invalid if

possible) value Run-time checks will detect the use of a invalid value GNAT pragma Initialize_Scalars

More flexible version of Normalize_Scalars Initial scalar value can be controlled Flexibility about which/when run-time checks are done

Page 15: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 15

Use of uninitialised data (2)

Memcheck detects a bug even if there is no invalid value Initialize_Scalars

Detects a bug only if there is an invalid value in the range of the type Otherwise, runs with different initial values can expose use of unitialised

data Initialize_Scalars is faster than memcheck

-O0 + all checks on + Initialize_Scalars only 2x slower than

-O2 + standard Ada Reference Manual checks (these checks detect the most horrible/random behaviour)

At Eurocontrol: Day to day development done with Initialize_Scalars

Some “shadow operational” testing period with Initialize_Scalars Week-end builds validated with memcheck

Page 16: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 16

Memory leaks

Avoid by using Ada constructs Often, some Ada constructs allow to avoid using heap

E.g. record discriminants, OO types without heap, arrays, …

Otherwise, manage heap a “safe” way : Controlled types, storage pools Not always possible (CPU, memory)

Detect with gcc/gnat debug pools (GNAT.Debug_Pools) “pre-processing” + recompile

Detect with memcheck --leak-check=full

Page 17: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 17

Dangling pointers

Avoid by using Ada constructs : same as avoid memory leaks Detect with gcc/gnat debug pools Detect with memcheck Detect with gcc “address sanitizer” option

New functionality, will be in gcc 4.8 Need to recompile Not (yet) tried at Eurocontrol

Page 18: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 18

Buffer overflows (1)

Ada arrays are first class citizens ‘range, ‘first, ‘last, … avoid buffer overflows Arrays always carry their bounds

Detect with Ada : standard mandates array index verification All array overflows are detected before damage Buffer overflow results in a run-time exception

=> no “random behaviour”

Very small overhead. Measured on a representative program (compiled with optimisation) :

less than 2% for all standard Ada Reference Manual checks (a part of these Ada RM checks are the buffer overflow checks).

Page 19: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 19

Buffer overflows (2)

Detect (not needed with Ada) with Memcheck Detects (most) buffer overflows in heap allocated blocks No detection in global or stack or “inside” a struct

Detect (not needed with Ada) with Exp-sgcheck Experimental tool detecting stack and global overrun No detection “inside” a struct

Detect (not needed with Ada) with gcc “address sanitizer” option Will be in gcc 4.8, not (yet) tried at Eurocontrol No detection “inside” a struct

Only the Ada run-time checks are detecting all buffer overflows E.g. “inside” record (struct) components

Page 20: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 20

Race conditions

Avoid by using Ada constructs Ada tasks (threads) are first class citizens Many constructs helps to avoid race conditions

Rendez-vous, protected objects, …

Ada multi-tasking constructs are easy – higher abstraction level(or at least easier to use than pthreads) E.g. protected objects

Detect by using helgrind (or drd) Helgrind used very successfully at Eurocontrol

Detect by using gcc “thread sanitizer” option New functionality, will be in gcc 4.8 Need to recompile Not tried (yet) at Eurocontrol

Page 21: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 21

Performance issues

Callgrind : where is my CPU spent ? It can measure a lot more

E.g. memory cache misses using a cache simulator

Callgrind is the main tool used at Eurocontrol to tune the performance

Kcachegrind : amazing visualisation tool for callgrind output

Page 22: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 22

Kcachegrind

Page 23: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 23

Kcachegrind

Page 24: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 24

Memory use analysis

Memcheck Report “delta memory” usage between two memory scans Reports can be triggered from the program or from the shell

Massif Shows the evolution of memory use with time Produces reports at regular interval or on request

Exp-dhat Shows if heap allocated memory is “accessed” a lot E.g. can report memory allocated and then not used anymore

Memcheck and Massif used at Eurocontrol

Page 25: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 25

Feedback from Valgrind use at Eurocontrol

Very easy to use No re-compilation, no re-linking, works with closed source libs, … Many powerful/advanced functionalities

But, depending on the tool 3 .. 100+ times slower 2 .. xxx+ more memory

Eurocontrol applications are big/heavy Encountered very high memory and CPU use by Valgrind => several optimisations/additional functionalities added to Valgrind

Page 26: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 26

Valgrind NEWS

One or two new major releases per year New platforms, support for new instructions, … New functionalities, new tools, … Optimisation in CPU or memory, … Bug fixes

Easy to get and compile new versions Get last released version on http://www.valgrind.org Next (unreleased) version:

svn co svn://svn.valgrind.org/valgrind/trunk valgrind cd valgrind ./autogen.sh ./configure --prefix=... make make install

Page 27: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 27

Valgrind NEWS

Current release : 3.8.1 Next release under development : 3.9.0 We will discuss recently provided or next release NEWS Not yet released functionality in orange (will be in 3.9.0)

Page 28: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 28

Valgrind NEWS: platforms

Started on linux/x86 Now available on

Linux/x86,amd64,ppc32,ppc64,arm,s390,mips32 Android/arm,x86 MacOS/x86,amd64

Support for new instructions E.g. SSE, AVX, AES E.g. ppc Decimal Floating Point instructions

Support for new distributions and glibc versions

Page 29: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 29

Valgrind NEWS: improved leak functionality

3.8.1 memcheck leak suppression suppresses all leak kinds

E.g. an entry aimed at suppressing “possible leak” also suppresses “definite leak”

Dangling pointer errors only reports the “freed at” stack trace

3.9.0 : A suppression optionally indicates the kind of leaks to suppress Command line arguments to control output and/or exit code

--show-leak-kinds=kind1,kind2,… --errors-for-leak-kinds=kind1,kind2,…

--keep-stacktraces=alloc|free|alloc-and-free|alloc-then-free|none Can report more stacktraces in a dangling pointer error Or can optimise memory by recording fewer or no stack traces

E.g. if not interested in some error kinds

--merge-recursive-frames=<number> Useful to limit the number of recorded stack traces by merging recursive calls

Page 30: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 30

Valgrind NEWS : gdb server (1)

GDB server allows to have fully debuggable program under Valgrind Connect with GDB to the Valgrind gdb server GDB can then

Insert breakpoints, (unlimited) watchpoints, … Examine the list of threads/tasks Examine the value of variables Continue/interrupt execution …

Valgrind gdb server provides “monitor commands” Allows to trigger Valgrind functionalities from GDB

(or from the shell command line)

E.g. for memcheck : leak search, checking definedness, …

Page 31: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 31

Valgrind NEWS : gdb server (2)memcheck monitor commands

get_vbits <addr> [<len>]

returns validity bits for <len> (or 1) bytes at <addr>

bit values 0 = valid, 1 = invalid, __ = unaddressable byte

Example: get_vbits 0x8049c78 10

make_memory [noaccess|undefined

|defined|Definedifaddressable] <addr> [<len>]

mark <len> (or 1) bytes at <addr> with the given accessibility

check_memory [addressable|defined] <addr> [<len>]

check that <len> (or 1) bytes at <addr> have the given accessibility

and outputs a description of <addr>

Page 32: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 32

Valgrind NEWS : gdb server (3)memcheck monitor commands

leak_check [full*|summary] [kinds kind1,kind2,...|reachable|possibleleak*|definiteleak] [increased*|changed|any] [unlimited*|limited <max_loss_records_output>] * = defaults where kind is one of definite indirect possible reachable all none Examples: leak_check leak_check summary any leak_check full kinds indirect,possible leak_check full reachable any limited 100 block_list <loss_record_nr> after a leak search, shows the list of blocks of <loss_record_nr> who_points_at <addr> [<len>] shows places pointing inside <len> (default 1) bytes at <addr> (with len 1, only shows "start pointers" pointing exactly to <addr>, with len > 1, will also show "interior pointers")

Page 33: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 33

Valgrind NEWS: tune red zones size

Red zone = protection zone before/after malloc-ed block Allows to detect buffer over/under-flow If too small: less chance to detect a bug If too big : uses too much memory

Command line options to increase/decrease size --redzone-size=<number>

Size for client (application) malloc’ed blocks

--core-redzone-size=<number> Size for Valgrind internal malloc’ed blocks

No buffer overflows with Ada => use minimal red zone

Page 34: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 34

Valgrind NEWS: support for other malloc libs

Command line –soname-synonyms=… allows to support non-libc malloc libraries or statically linked libs

--soname-synonyms=somalloc=*tcmalloc* Support for all variants of tcmalloc shared libraries

--soname-synonyms=somalloc=NONE Support for a statically linked malloc library

Page 35: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 35

Valgrind bad NEWS: failure to develop, help needed

Valgrind serialises thread execution In other words, on a multi-core, Valgrind can only use one core

Trial done to make a “really” multi-threaded Valgrind Many race conditions found (with Valgrind on Valgrind)

Some have been fixed The “none” tool reasonably uses multi-core

Biggest (not solved) blocking problem: Memcheck “VA bits” data structure is used for each memory access Using locks to protect it is way too slow Even using one atomic instruction is too slow => ????

Ideas/help welcome …

Page 36: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 36

Reliable Software : other tools/approaches/…

AdaControl : Ada coding rule checker Developed initially for Eurocontrol. Open source Routinely used at Eurocontrol

Static code analyzers CodePeer (Adacore)

Program provers SPARK : annotated subset of Ada Ada 2012 contracts

Page 37: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 37

Conclusion : Reliable Software

Reliable software obtained using a combination of various techniques and tools

Use a safe language, i.e. Ada Complement this with tools

Valgrind is a main tool used at Eurocontrol Use it, you will like it

Page 38: Tools and Techniques for Higher Reliability Software

FOSDEM 2013 38

Questions ?