Case Studies of Validation Procedures Performed at PTBextras.springer.com/2009/978-0-8176-4592-2/A.1.1... · Case Studies of Validation Procedures Performed at PTB 7 Fig. 3. Example

Case Studies of Validation ProceduresPerformed at PTB

Norbert Greif and Dieter Richter

Physikalisch-Technische Bundesanstalt, Abbestr. 2-12, 10587 Berlin, [email protected], [email protected]

This paper presents two case studies of validation procedures performed atPTB. The first one is a validation process for a software product that claimsto be conform with the GUM [GUM95, GSR06]. The second one representsthe validation of a software product used for the calibration of parallel gaugeblocks [GSR99]. Selected aspects of the validation procedure and questions ofappropriate validation methods within the context of an integrated validationconcept are discussed.

According to the basic procedure of software validation, software require-ments and validation methods are the main objects of interest within the vali-dation process. The definition of requirements and their refinement to testablecriteria are the fundamental working steps of the validation procedure. Afterthe selection of appropriate methods and their proper execution, the docu-mentation of requirements, refinements and test results is as important as theexecution of the steps itself. Feedback between different steps may have aninfluence.

For a successful validation, it is essential that testable requirements havebeen derived from the initial, and often general requirements. The result-ing refined requirements must cover both metrological and software-relatedaspects. Therefore, the refinement process is highly interdisciplinary. Further-more, refined requirements serve as an interface between software developersand software testers. The software developer must consider them as target cri-teria in the development process, and the tester uses them as quality criteria.

1 Validation for a GUM-supporting software product

It has been shown in practice that although there is a desire for validation, re-quirements are usually not completely defined and refined to testable criteria.The requirements and the validation objective are often vague, as, for ex-ample, ”the software shall be of high quality”, or ”the software shall function

2 Norbert Greif and Dieter Richter

correctly”. For a validation, these ”requirements” are too imprecise. In the fol-lowing, it will be shown by means of the example ”GUM-supporting software”how an appropriate refinement can be developed. Furthermore, appropriatevalidation methods will be selected depending on the types of requirementsderived.

1.1 Refinement example

The target of validation is a specific exemplar of a GUM-supporting soft-ware product, which calculates measurement uncertainties according to theprescription of the GUM. The following refinement process is performed:

Overall validation objective:

The objective is to provide sufficient trust that the software conforms to the”Guide to the Expression of Uncertainty in Measurement (GUM)” [GUM95].

Refinement step 1:

The general validation objective is refined into two main metrological require-ments and further requirements that are not described here:

• correct implementation of the algorithms prescribed by GUM,• correctness of the results calculated for the examples in GUM appendices,• other requirements.

In the description of all the subsequent steps, we restrict ourselves to therefinement with respect to the first requirement.

Refinement step 2:

To validate the correct implementation of the GUM algorithms, the followingrequirements have to be met:

• correct evaluation of the standard measurement uncertainty for input dataaccording to type A,

• correct evaluation of the standard measurement uncertainty for input dataaccording to type B,

• correct differentiation of model formulas,• other requirements.

Refinement step 3:

The correct evaluation of the standard measurement uncertainty according totype A is broken down into

• correct calculation of arithmetic mean,• correct calculation of standard deviation,• correct calculation of degrees of freedom,• other requirements.

Case Studies of Validation Procedures Performed at PTB 3

Refinement step 4 (Final refinement):

The correct calculation of the arithmetic mean requires

a) compliance of calculated and expected results,b) protection against abnormal ends of the program,c) independence of calculated results from interface variables.

At Step 4, the level of detail is sufficient to allow concrete validation methodsto be assigned to each of the derived requirements.

1.2 Assignment of validation methods

Even after the requirements have been defined and refined, it is often a diffi-cult task for a metrologist to select the appropriate validation methods fromthe variety of methods. The difficulty is twofold: There is a lack of knowledgeon the capabilities of (part of) the methods, and it is difficult to differentiatebetween methods with respect to their evidential power related to a given par-ticular requirement. In a few cases, support is available in guidance documentssuch as, for example, for measuring instruments subject to the European Mea-suring Instruments Directive for legal metrology in the WELMEC SoftwareGuide 7.2 [WEL05]. If no guidance document is available, metrologists are rec-ommended to discuss questions of method selection with software engineers.In the following, some assignments of validation methods to requirements re-fined above in our example are given:The overall refinement process and the assignment of appropriate validation

Refined Requirement Validation MethodCompliance of calculated Dynamic program testand expected results using test data

Protection against abnormal Checking for appropriate clauses inends the source code by static program analysis

Independence of calculated Inspection of the relevantresults from interface variables pieces of source code

methods are illustrated in Figure 1.

1.3 Execution of the particular validation

The particular validation of the GUM-supporting software performed had thefollowing aims:

A-1 to show that the program calculates correct output values and works ac-cording to the GUM rules and equations,


A-2 to show that the program reproduces the output values of the examplesin the GUM appendices, and

A-3 to get an impression of the reliability of the program. Each program shouldbehave predictable even in the case of unusual, inplausible, or incalculableinput values.

As the GUM software has two interfaces - a graphical user interface and aprogramming interface - the functionality and reliability had to be evidencedfor both. As shown before, the aims mentioned above were refined into a set

Fig. 1. Example of requirements refinement and assignment of validation methods

of testable requirements, and a set of appropriate validation methods wereselected and assigned. The validation methods used were: dynamic (black-box) program tests, static program analysis, code inspections, and others as,e.g., document inspection.

Correct computation, aims A-1 and A-2

The program approval started with black-box tests of the program using theinteractive graphical user interface. Several hundred test cases were designed,each of them consisting of input values, expected output values, and the ex-pected program behaviour (e.g. warnings to the user). To get a systematic andclose set of test cases, the main functionality of the software was subdividedinto the following (sub-)functions:

F-1 processing input data according to type A (alternatively type B),F-2 calculation of the correlation among input data,


F-3 processing of the measurement model and calculation of the sensitivitycoefficients,

F-4 computation of the estimated value of the output quantities,F-5 computation of the standard measurement uncertainty of the output quan-

tities,F-6 determination of the degrees of freedom,F-7 calculation of the expanded uncertainty.

Each of these functions has its own input and output values, whereas theoutput values of one function usually serve as input values for some otherfunctions. For testing purposes the functions were considered being roughlyindependent of each other. Later it was clarified, whether this assumption wasvalid.

During the tests of one function, the conditions for the other functions werekept constant as much as possible. The function F-1, for instance, accepts aseries of measured values as input and computes the mean value and varianceof these values. Its behaviour is influenced by, e.g., the number of measuredvalues, their absolute numeric values, and their distribution. These influenc-ing factors were varied during the dynamic tests of function F-1, whereasinfluencing factors of other functions, like the model equation, were kept on adefault value.

The following influences were considered:The expected output values were calculated on three different ways: manual

Function InfluencesF-1 (type A) number of measurement values, absolute numeric values, distributionF-1 (type B) distribution, mean value, varianceF-2 number of series, correlation between themother functions

calculation, with the help of a pocket calculator, or with the help of an alge-bra program that operates with a user-defined or non-limited precision. Foreach actual output value the following items were checked: consistency withexpected value, correct number of significant digits, and correct applicationof rounding rules.

The tests were performed using a script based dynamic test tool.

Reliability, aim A-3

After having proved the general algorithmic and numeric correctness of theprogram, several other properties were checked: the reaction on implausible orsyntactically incorrect user input values, the reaction on models and data forwhich GUM is not applicable, and the program’s internal security measures.Such internal security measures like plausibility checks of parameters andreturn values, checks of pointers, checks of denominators, etc., are important


for a stable and predictable program reaction. These tests were performed bystatic program analysis using an automated checker tool.

1.4 Specific problems

Two questions remained open after all dynamic and static program checks:

Q-1 The question whether the program called by the programming interfacereacts the same way as called by the graphical user interface, and

Q-2 the justification of the assumption that all functions are independent eachother.

Using the programming interface, the software can serve as a subprogram orservice in other systems. The dynamic program testing was exclusively per-formed using the graphical user interface, so the question arises, whether theresults are valid for the other case, too. Instead of repeating the tests, thesource code was partly inspected by looking at those pieces, which containthe different connections of the program kernel to the interfaces. From thiscode inspection, one could infer from the validation of one case to the vali-dation of the other case. This is illustrated in Figure 2. The other question

Fig. 2. Example of a software with two interfaces

affects the independence of the computational (sub-)functions: Is it feasibleto neglect test cases in which the influencing conditions are varied for two ormore functions at the same time? Is the software well-structured according tothe computational functions, i.e., does the software contain separate modulesfor each function? Figure 3 illustrates this issue. These two questions wereanswered by a manual inspection of the relevant pieces of source code.

In general, manual code inspection is a very time-consuming, highly chal-lenging method of software testing. However, as the examples above show, incertain cases, it may be an effective and efficient, also a complementary wayto validate metrological software.


Fig. 3. Example of a test-motivated separation of software

2 Validation for length measurement calibration

Parallel gauge blocks are important material artefacts in length measurement.They represent a certain length by the distance between two parallel bound-ary planes. The difference measurement (comparison between a gauge blockunder test and a reference standard, see Figure 4) is performed repeatedly inthe center and near the four corners. The requirements for the measurementconditions are very high, in particular concerning the temperature, the stan-dard deviation of the length differences and the care with which the blocksmust be handled. At PTB, a specific guide to validate software for lengthmeasurement calibration was elaborated.

2.1 Risks caused by software

Concerning the expected software risks, a distinction between the followingfive groups of problems was proposed:

• functionality, adequacy and completeness of software,• correctness of the code,• data handling,• proper use of software,• management problems.

Functionality, adequacy and completeness of software

All the problems dealing with the adequate and correct coverage of expectedfunctions by the software from the contents point of view belong to this group.In particular, questions arising from the application of mathematical algo-rithms, including the mathematical background and numerical aspects, playan essential role. However, in gauge block calibration no mathematical prob-lems are actually encountered. The functions supported by software are the


Fig. 4. Principle of the difference measurement

control of gauge block comparisons and some aspects of measurement dataprocessing.

The controlling part has to ensure that the calibration process is performedin accordance with the prescribed steps. These steps can be altered dependingon the calibration task and on intermediate measurement results. Alterationsare

• surface positions used for the difference measurement,• single or repeat measurements, conditions for repetitions,• application or not of flatness correction,• application or not of temperature correction.

Another sensitive aspect of the controlling software is the reliability underextreme conditions. This point is evidently a general one. Here we have thefollowing two points:

• correct execution of measurements, also under extreme but permissibleconditions,

• rejection of execution or at least a warning in the case of inadmissibleconditions or input data.

With respect to completeness of the software functions, risks exist when thefollowing functions are not supported sufficiently:

• input, use and archiving of data sets relating to reference standards,


• input of parameters for the gauge block under test,• control of the execution of measurements,• print of test protocols, results and calibration certificates,• archiving of measurement results including a list of contents.

Concerning the measurement data processing part of the software sources ofrisks are related to the following functions:

• determination of length differences between the reference standard and thegauge block under test,

• determination of length differences between measurements in the centerand in the corners,

• averaging of length differences determined,• flatness correction (addition of a corrective value),• elimination of temperature influences (according to a given formula).

The only critical algorithmic point is the wipe-out of digits when the dif-ferences are calculated inappropriately. Another crucial point of metrologysoftware in general is the requirement of the repeatability of measurements.This means that, under the same conditions, the software must produce thesame result. With gauge block calibrations, we have not observed any specialproblems concerning this point.

Correctness of the code

The source code of software is written by man. This is the reason why absolutecorrectness of the code cannot in general be guaranteed. Shortcomings or evensystematic errors can not be excluded. There are a lot of advanced tools tosupport software development or to test software, in particular with a view todetect bugs. These tools - provided they are used properly - are a big help toreduce coding errors to a very low rate. However, there is still a residual errorrisk. No particular risks are known with respect to the software for gauge blockcalibration as regards code correctness. What is generally true, also applies inthis case.

Problems of data handling

Data handling has also proved to be a critical factor with respect to the accu-racy of measurement results. Small changes of data caused by transformationscan have a negative impact on the accuracy, for instance in connection withthe wipe-out of digits. In addition to perturbations of data caused by numer-ical calculations, data can be perturbed by transfer and transformation ontheir path between different media, systems and men.

A gauge block calibration system has three interfaces and a general com-ponent which must be carefully considered. The first interface is that betweenthe software system and the user. It realizes the input of data by the user,for example the input of the deviation of the reference gauge block from thenominal value. Sources of errors are


• mistakes or oversights of the user,• transformation from the decimal to the binary representation, abbreviation

or rounding errors.

The second interface is the take-over of data from the measurement hardwareto the software. Sources of errors are

• the hardware interface itself,• transformation of data between different forms of representation.

The third interface which possibly causes errors is that to external presenta-tion of results on screens or in printed form. Here we have again the problemof

• transformation between different forms of representation.

A component of general importance for data correctness is the data storagesystem. Evidently, any mismatch in this part of the systems can lead to fail-ures.

Proper use of software

Improper use or misuse of software is another crucial cause of system failure.Frequently, the users even do not become aware when it happens. There aretwo typical reasons for the improper use of software:

• incorrect or insufficient documentation of the software delivered, incorrectinterpretation or incomplete study of documentation by the user,

• unstable behaviour of the software as already mentioned above and theusers attempt to change something on their own risk.

These points are generally true and they apply also to gauge block calibration.

Management problems

Software is a ”living” product. It is changed from time to time for several rea-sons, for example to enhance the performance parameters, to correct defects,etc. Changes of software involve the risk of organizational mistakes. Becausesoftware is handled by different parties, problems such as incompatibility be-tween pieces of software, invalid versions, incorrect documentation can arise.On the other hand, as software can be easily changed there is no naturalbarrier to prevent this.

As a consequence strong management rules should be established to pre-vent organizational mistakes. Important features are a complete identificationof all those parts of the software that are subject to individual handling, theapplication of a unique version management, the assuring of correspondencebetween versions and their documentation.

If necessary, the supervision of an appropriate protection against manipu-lation of software or of data is another task incumbent upon the management.


To minimise risks caused by software, technical recommendations havebeen derived both from the state of the art in software engineering and fromthe technology of gauge block calibration [GSR99].

2.2 A practical approach to software validation

After risks and recommended technical measures have been identified, the nextimportant step is to provide a practically applicable approach to integrate therecommendations into the daily work of calibration services. We proposed touse a checklist consisting of a set of essential software requirements and acorresponding testing procedure. Checklists are an approved means of qualityassurance. Basically, no additional technical equipment is required. However,they must cover sensitive aspects of the software.

With respect to gauge block calibration, it has turned out that workingwith a checklist representing a set of minimum mandatory requirements forthe software is an appropriate means to enhance the reliability of software.Consequently, the checklist should be applied in calibration laboratories bothfor the purpose of accreditation (when a calibration service is certified byexternal experts) and internal audits (when a self-check is made).

Prerequisites for the work with a checklist are:

• that an appropriate list of minimum mandatory requirements is drawn upand

• that a set of testing instructions is formulated on how to prove compliancewith the requirements.

Mandatory software requirements

The elaboration of a set of minimum mandatory requirements has been basedon the risk analysis outlined in Section 2.1 and on the technical recommen-dations developed [GSR99] . It has so far ended in a list of 29 requirementsarranged in the following 7 groups:

G1: Documentation of the softwareG2: Identification of the softwareG3: Minimal extent of functionsG4: Reliability and usability of the softwareG5: Correctness of data usedG6: Protection of dataG7: Traceability of valid measurements

For illustration purposes, we list the 5 requirements of the group G6:

• Standard data sets stored by the program must be protected against mod-ifications. Measures like read-only status in DOS, simple encoding, or stor-age in database records are sufficient.


• Standard data sets may be applied by the program only when they werefed a second time, or when they were confirmed after being presented asa whole to the user.

• Input and modification of standard data sets is not allowed for any user.This function (or the program as a whole) must be protected by a pass-word.

• If the calibration of a set of gauge blocks is finished and confirmed bythe user, the corresponding measurement values must be stored in such away, that they are protected against modification, or that modificationsare recorded at least. After finishing the calibration, the measurementsmust not be repeated.

• If measurement values are taken over directly from the measuring device,they must not be modifiable. The complete manual input of measurementvalues of a whole calibration is allowed.

Testing procedure

To determine whether or not a calibration program complies with the re-quirements, concrete testing instructions summarized in a testing guidelineare necessary. For practical reasons, these instructions must be such that anon-specialist of software is able to understand and apply them. So far thetesting guideline consists of 93 detailed steps that cover as a whole the derivedrequirements set. However, there is no one-to-one correspondence between theinstructions and the requirements. The calibration program must be calledseveral times to cover all instructions. For more details we refer to [GSR99].

References

[GSR99] Greif, N., Schrepf, H., Richter, D.: Software Evaluation in Calibration Ser-vices: Requirements and Testing Procedure, in: D. Richter, V. Granovski(eds.), Methodological Aspects of Data Processing and Information Sys-tems in Metrology, PTB Report PTB-IT-7, 60 -72, ISBN 3-89701-379-7,Braunschweig and Berlin, June, 1999.

[GSR06] Greif, N., Schrepf, H., Richter, D.: Software Validation in Metrology: ACase Study for a GUM-Supporting Software, Measurement 39(2006) 849-855, 2006.

[GUM95] Guide to the Expression of Uncertainty in Measurement (GUM),ISO/BIPM Guide, ISBN 92-67-10188-9, 1995.

[WEL05] WELMEC Guide 7.2: Software Guide (Measuring Instruments Directive2004/22/EC), http://www.welmec.org, 2005.

Documents

Case Studies of Validation Procedures Performed at PTBextras.springer.com/2009/978-0-8176-4592-2/A.1.1... · Case Studies of Validation Procedures Performed at PTB 7 Fig. 3. Example