RESEARCH REPORT 454Model for Hit/Miss Data 18 6.4.2. Model for Signal Response Data 19 6.4.3. To Compute PoD parameters 20 6.4.4. To Achieve the Desired PoD/Confidence Limit Combination

HSE Health & Safety

Executive

Probability of Detection (PoD) curves

Derivation, applications and limitations

Prepared by Jacobi Consulting Limited for the Health and Safety Executive 2006

RESEARCH REPORT 454

HSE Health & Safety

Executive

Probability of Detection (PoD) curves

Derivation, applications and limitations

George A Georgiou Jacobi Consulting Limited

57 Ockendon Road London N1 3NL

There is a large amount of ‘Probability of Detection’ (PoD) data available (eg National NDT Centre (UK), NORTEST (Norway), NIL (Netherlands) and in particular NTIAC (USA)). However, it is believed that PoD curves produced from PoD data are not very well understood by many who use and apply them. For example, in producing PoD curves, a certain material and thickness may have been used and yet one can find the same PoD quoted for a range of thicknesses. In other cases, PoD curves may have been developed for pipes, but they have been applied to plates or other geometries. Similarly, PoD curves for one type of weld (eg single sided) have been used for other welds (eg double sided). PoD data are also highly dependent on the Non-Destructive Testing (NDT) methods used to produce them and these data can be significantly different, even when applied to the same flaws and flaw specimens. It is often assumed that the smallest flaw detected is a good measure of PoD, but there is usually a large gap between the smallest flaw detected and the largest flaw missed. Similarly, it is often assumed that human reliability is a very important factor in NDT procedures, and yet it is usually found not to be as important as other operational and physical parameters.

It is important to question the validity of how PoD curves are applied as well as their limitations. This report aims to answer such questions and in particular their relevance to fitness for service issues involving PoD.

The overall goal of this project is to provide clear, concise, understandable and practical information on PoD curves, which will be particularly useful for Health and Safety Inspectors when discussing safety cases involving PoD curves.

This report and the work it describes were funded by the Health and Safety Executive (HSE). Its contents, including any opinions and/or conclusions expressed, are those of the author alone and do not necessarily reflect HSE policy.

HSE BOOKS

© Crown copyright 2006

First published 2006

All rights reserved. No part of this publication may bereproduced, stored in a retrieval system, or transmitted inany form or by any means (electronic, mechanical,photocopying, recording or otherwise) without the priorwritten permission of the copyright owner.

Applications for reproduction should be made in writing to: Licensing Division, Her Majesty's Stationery Office, St Clements House, 2-16 Colegate, Norwich NR3 1BQ or by e-mail to [email protected]

ii

TABLE OF CONTENTS

TABLE CAPTIONS AND FIGURE CAPTIONS vEXECUTIVE SUMMARY viBackground viObjectives viWork Carried Out viConclusions viiRecommendations vii

1. INTRODUCTION 1

2. OBJECTIVES 2

3. DERIVATION OF POD CURVES 2

3.1. A HISTORICAL BACKGROUND AND DEVELOPMENT OF NDT RELIABILITY METHODS 2

3.2. EXPERIMENTAL REQUIREMENTS TO PRODUCE POD CURVES 3

3.3. THE AVAILABLE PROBABILITY METHODS TO PRODUCE POD CURVES 43.3.1. PoD Curves for Hit/Miss Data 43.3.2. PoD Curves for Signal Response Data 53.3.3. Sample Sizes 6

3.4. CONFIDENCE LIMITS (OR CONFIDENCE INTERVALS) 7

3.5. PUBLISHED WORK ON THE MODELLING OF POD 83.5.1. An Overview 83.5.2. The PoD-generator (The Netherlands) 93.5.3. Iowa State University (USA) 93.5.4. National NDT Centre (UK) 9

4. THE PRACTICAL APPLICATION OF POD CURVES 10

4.1. HOW POD CURVES ARE USED IN INDUSTRY 10

4.2. PUBLISHED WORK ON POD CURVES IN DIFFERENT INDUSTRIES 104.2.1. Aerospace (NASA) 104.2.2. Aircraft Structures, Inclusions in Titanium Castings 104.2.3. NORDTEST Trials 114.2.4. Nuclear Components (The PISC Trials) 114.2.5. Offshore tubular Joints 114.2.6. Dutch Welding Institute (NIL) 114.2.7. Railways (National NDT Centre (UK)) 124.2.8. LPG Storage Vessels 13

5. THE LIMITATIONS OF APPLYING POD CURVES 13

5.1. COMMENTS ON HIT/MISS DATA AND SIGNAL RESPONSE DATA 13

5.2. IMPORTANT OPERATING AND PHYSICAL PARAMETERS 145.2.1. NDT Method 155.2.2. Fluorescent Penetrant NDT 155.2.3. Material Properties 155.2.4. Specimen Weld Geometry 155.2.5. Flaw Characteristics 16

iii

5.2.6. Human Reliability 16

6. DISCUSSION 16

6.1. INTRODUCTION 16

6.2. AIMS AND OBJECTIVES 17

6.3. HISTORICAL DEVELOPMENT 17

6.4. FLAW SAMPLE SIZES FOR ‘HIT/MISS’ DATA AND ‘SIGNAL RESPONSE’ DATA 18 6.4.1. Model for Hit/Miss Data 18 6.4.2. Model for Signal Response Data 19 6.4.3. To Compute PoD parameters 20 6.4.4. To Achieve the Desired PoD/Confidence Limit Combination 20

6.5. POD MODELLING 20

6.6. PRACTICAL APPLICATIONS OF POD 21 6.6.1. Aircraft Structures, Inclusions in Titanium Castings 21 6.6.2. NORDTEST Trials 21 6.6.3. Nuclear Components (The PISC Trials) 21 6.6.4. Offshore Tubular Joints 21 6.6.5. Dutch Welding Institute (NIL) 22 6.6.6. Railways 22 6.6.7. LPG Storage Vessels 22

6.7. DEPENDENCE OF POD ON OPERATIONAL AND PHYSICAL PARAMETERS 22 6.7.1. Important Operational and Physical Parameters 22 6.7.2. NDT Method 23 6.7.3. Fluorescent Penetrant NDT 23 6.7.4. Material Properties 23 6.7.5. Specimen Weld Geometry 23 6.7.6. Flaw Characteristics 23 6.7.7. Human Reliability 24

7. INDEPENDENT VERIFICATION 24

8. CONCLUSIONS 24

9. RECOMMENDATIONS 25

10. ACKNOWLEDGEMENTS 25

11. REFERENCES 25

12. VERIFICATION STATEMENT

TABLES 1 FIGURES

APPENDIX A

APPENDIX B

APPENDIX C

1 - 13

GLOSSARY OF TERMS, STATISTICAL TERMINOLOGY AND OTHER RELEVANT INFORMATION AN AUDIT TOOL FOR THE PRODUCTION AND APPLICATION OF POD CURVES THE VALIDITY OF THE JCL ‘INDEX OF DETECTION’ MODEL

iv

TABLE CAPTIONS AND FIGURE CAPTIONS

TABLE CAPTIONS

Table 1 Maximum Probability Tables

FIGURE CAPTIONS

Figure 1 Example of detection percentages for a handheld Eddy-Current inspection and a ‘log-odds’ distribution fit to the data.

Figure 2 Ultrasonic NDT hit/miss data illustrating the relatively large gap between the smallest flaw detected and the largest flaw missed.

Figure 3 The linear relationship between the log-odds and log flaw size. Figure 4 Schematic of the PoD for flaws of fixed dimension for ‘hit/miss’ data. Figure 5 Schematic of the PoD for flaws of fixed dimension for ‘signal response’ data. Figure 6 A comparison between the log-odds and cumulative log-normal distribution functions

for the same parameters � =0 and � =1.0. Figure 7 An example of when the log-odds model was not applicable to the data collected Figure 8 PoD (a) log-odds model results for different NDT methods applied to the same flaw

specimen. Figure 9 PoD (a) log-odds model results for fluorescent penetrant: no developer and developer

applied to the same flaw specimen Figure 10 PoD (a) log-odds model results for manual eddy currents: different materials but

nominally the same flaws Figure 11 PoD (a) log-odds model results for X-ray radiography: different weld conditions but

nominally the same flaws Figure 12 PoD (a) log-odds model results for fluorescent penetrant: different flaws but nominally

the same specimens Figure 13 PoD (a) log-odds model results for Ultrasound (Immersion): different operators but

inspecting the same flaw specimen

v

vi

EXECUTIVE SUMMARY

Background

There is a large amount of ‘Probability of Detection’ (PoD) data available (e.g. National NDT Centre (UK), NORTEST (Norway), NIL (Netherlands) and in particular NTIAC (USA)). However, it is believed that PoD curves produced from PoD data are not very well understood by many who use and apply them. For example, in producing PoD curves, a certain material and thickness may have been used and yet one can find the same PoD quoted for a range of thicknesses. In other cases, PoD curves may have been developed for pipes, but they have been applied to plates or other geometries. Similarly, PoD curves for one type of weld (e.g. single sided) have been used for other welds (e.g. double sided). PoD data are also highly dependent on the Non-Destructive Testing (NDT) methods used to produce them and these data can be significantly different, even when applied to the same flaws and flaw specimens. It is often assumed that the smallest flaw detected is a good measure of PoD, but there is usually a large gap between the smallest flaw detected and the largest flaw missed. Similarly, it is often assumed that human reliability is a very important factor in NDT procedures, and yet it is usually found not to be as important as other operational and physical parameters. It is important to question the validity of how PoD curves are applied as well as their limitations. This report aims to answer such questions and in particular their relevance to fitness for service issues involving PoD. The overall goal of this project is to provide clear, concise, understandable and practical information on PoD curves, which will be particularly useful for Health and Safety Inspectors when discussing safety cases involving PoD curves. Objectives

♦ To provide a clear and understandable description of how PoD curves are derived. ♦ To provide practical applications of how PoD curves are used and their relevance to fitness

for service issues. ♦ To quantify the limitations of PoD curves.

Work Carried Out

A historical overview is provided for PoD in section 3 and a description of how the techniques used to produce PoD curves have evolved during the last three decades, paying special attention to the fundamental PoD functions for ‘hit/miss’ data and ‘signal response’ data. In this respect, Appendix A provides additional help for non-statisticians on the basic elements and mathematics of PoD functions, and Appendix B provides an audit tool for those interested in producing or assessing PoD curves. Part of section 3 is also devoted to published work on the modelling of PoD over the same period. A range of different industrial applications of PoD curves are discussed in section 4, and the opportunity was taken to update the results of an earlier application of PoD curves to Liquid Petroleum Gas (LPG) spheres. The details of the new work, which can be regarded as having wider applications, are included in Appendices C and D (i.e. the ‘Probability of Inclusion’ and the ‘Guidelines’ on inspecting welds respectively). Section 5 has been devoted to the limitations of applying PoD curves, as well as the main operational and physical parameters they are dependent on.

vii

In order to illustrate and explain many of the important issues discussed, and which are particularly relevant to PoD, a number of experimental and theoretical examples are provided throughout the report. Conclusions

• The ‘log-odds’ distribution is found to be one of the best fits for hit/miss NDT data. • The log-normal distribution is found to be one of the best fits for signal response NDT data, and

in particular for flaw length and flaw depth data as determined by ultrasonic NDT. • In some cases, the ‘log-odds’ and cumulative log-normal distributions are very similar, but

there are many cases where they are significantly different. • There are NDT data when neither the ‘log-odds’ nor the log-normal distributions are

appropriate and other distributions need to be considered. • There is often a large gap between the smallest flaw detected and the largest flaw missed. • Very small or very large flaws do not contribute much to the PoD analysis of hit/miss data. • To achieve a valid ‘log-odds’ model solution for hit/miss data, a good overlap between the

smallest flaw detected and the largest flaw missed is necessary. • To achieve a valid log-normal model solution for signal response data, there is less reliance on

flaw size range overlap, but more on the linear relationship between ln(â) and ln(a). • When the PoD (a) function decreases with increasing flaw size, it is usually an indication that

the NDT procedures are poorly designed. • When the lower confidence limit decreases with increasing flaw size, notwithstanding an

acceptable PoD (a) function, it is usually associated with extreme or unreasonable values of the mean and standard deviation.

• The effect on PoD results for particular operational and physical parameters can be significant for datasets selected from the NTIAC data book of PoD curves.

• The PoD data in the NTIAC data book were collected some 30 years ago and may not necessarily reflect current capabilities with modern digital instrumentation. However, the results are still believed to be relevant to best practice NDT.

• The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in question. It would be wrong to draw too many general conclusions about the particular PoD values (e.g. ultrasound is better than X-ray).

• Figures 7 - 13 serve to illustrate the possible effects that the physical and operational parameters can have on the PoD and an awareness of these effects is important when quoting PoD results.

• NDT methods, equipment ‘calibration’, fluorescent penetrant developers, material, surface condition, flaws and human factors are all important operational and physical parameters, which can have a significant effect on PoD results.

• Whilst human factors are important variables in NDT procedures, they are often found not to be as important as other operational and physical variables.

• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the JCL ‘Probability of Inclusion’ model.

• The earlier JCL ‘Probability of Inclusion’ model has been validated against an independently developed ‘Probability of Inclusion’ model by MBEL.

Recommendations

• Publish a signal response data book of PoD results. • Publish a more up to date data book from different PoD studies and collate them in a way which

best serves more general industrial and modelling applications. • Set up a European style project or Joint Industry Project to realise the above recommendations.

viii

1

1. INTRODUCTION

There is a large amount of ‘Probability of Detection’ (PoD) information available; starting with the pioneering work in the late 1960’s to early 1970’s for the aerospace industry, to more recent and more general industrial applications (e.g. National NDT Centre (UK), NORTEST (Norway), NIL (Netherlands), PISC (Europe) and NTIAC (USA)). However, it is believed that PoD curves are not very well understood by many who use and apply them. PoD curves have been produced for a range of Non-Destructive Testing (NDT) methods (e.g. ultrasound, radiography, eddy currents, magnetic particle inspection, liquid penetrants, visual and others). Whilst it is reasonable to assume that each NDT method will produce different PoD curves (even when applied to the same flaws), it is believed that many who use PoD curves do not fully appreciate how significant the differences can be. PoD curves are also dependent on a number of physical parameters (e.g. material, thickness of component, flaw type, geometry etc) and this too is not always appreciated. In some applications only one thickness may have been used and yet the same PoD is quoted for a range of thicknesses. In other cases, PoD curves may have been derived for pipes, but they have been used for plates or other geometries. Similarly, PoD curves for one type of weld (e.g. single sided or J-prep welds) have been used for other welds (e.g. double sided or double V-prep welds) without any justification. It is important to have an understanding of how PoD curves are derived and to question the validity of how PoD curves are applied, as well as to appreciate their limitations. This report aims to provide this information as well as considering their relevance to health and safety issues. Section 3 provides an overview of the historical background to PoD, the experimental requirements for PoD curves in practice (e.g. minimum sample size and confidence limits), the various approaches to produce PoD curves and the development of theoretical modelling of PoD curves. Section 4 provides information on how curves are used in industry and there are a number of examples provided in various industrial applications and publications. There are examples of both practical and impractical applications of PoD curves. Section 5 discusses the dependence of PoD curves on a range of operational and physical parameters and the limitations of applying PoD curves. Section 6 is used to bring together all the salient features of the report and is much more than an executive summary. This is done so that (i) Health and Safety Inspectors can get an informed overview of the report without having to read each section in detail and (ii) to provide the basis of an externally published paper. The conclusions and recommendations are provided in Section 8 and Section 9 respectively. Where the statistics is believed to be important to the discussion, some explanation is provided in the main body of the text, but in most cases more detailed explanations are provided in Appendix A. The mathematical content has been kept to a minimum and at a level which is suitable for scientists and engineers, who may not necessarily have a background in statistics. Appendix A also includes a glossary of terms and symbols, statistical definitions and different terminologies used in this report. In order to assist organisations interested in producing PoD curves, perhaps for the first time, and in particular Health and Safety Inspectors dealing with PoD issues in industry, an ‘audit tool’ (or check list) has been provided in Appendix B. There is a self contained report in Appendix C that deals specifically with validating an earlier probabilistic model developed for the Health and Safety Executive (HSE) (i.e. the ultrasonic NDT of LPG storage vessels). The model, which makes use of PoD curves and hence relevant to this study, has been updated along with its companion report, the HSE guidelines on how to use the model

2

(Appendix D). Both the updated model and updated companion guidelines are now considered as having wider applications than just the ultrasonic NDT of LPG storage vessels. The whole report has been read by a qualified statistician to verify and check the calculations and to assess that the conclusions and recommendations are based on sound scientific reasoning. Additional verifications have been carried out by others and the full details are discussed in section 7 and a formal verification statement is made in Section 12. The overall goal of this project is to provide clear, concise and understandable information on PoD curves, which will be particularly useful for Health and Safety Inspectors in discussing safety cases involving PoD curves. 2. OBJECTIVES

♦ To provide a clear and understandable description of how PoD curves are derived. ♦ To provide practical applications of how PoD curves are used and their relevance to fitness for

service issues. ♦ To quantify the limitations of PoD curves.

3. DERIVATION OF POD CURVES

3.1. A HISTORICAL BACKGROUND AND DEVELOPMENT OF NDT RELIABILITY METHODS

Non-destructive Testing (NDT) reliability may be defined as 'the probability of detecting a crack in a given size group under the inspection conditions and procedures specified' (1). There are of course other similar definitions, but the underlying statistical parameter is the PoD, which has become the accepted formal measure of quantifying NDT reliability. The PoD is usually expressed as a function of flaw size (i.e. length or depth), although in reality it is a function of many other physical and operational parameters, such as, the material, the geometry, the flaw type, the NDT method, the testing conditions and the NDT personnel (e.g. their certification, education and experience). Repeat inspections of the same flaw size or the same flaw type will not necessarily result in consistent hit or miss indications. Hence there is a spread of detection results for each flaw size and flaw type and this is precisely why the detection capability is expressed in statistical terms such as the PoD. An early example of this is illustrated in the paper by Lewis et al (2), who had 60 air force inspectors use the same surface eddy-current technique to inspect 41 known cracks around countersunk fastener holes in a 1.5m length of a wing box. The results are illustrated in Figure 1 in terms of a detection percentage (i.e. the number of times a crack was detected relative to the number of detection attempts). The chances of detecting the cracks increases with crack size, as one might expect, but none of the cracks were detected 100% of the time and different cracks with the same size have quite different detection percentages. Figure 1 also shows that the ‘log-odds’ distribution is a reasonable fit to this data and illustrates why PoD is considered an appropriate measure of detection capability. PoD functions, for describing the reliability of an NDT method or technique have been the subject of many studies and have undergone considerable development since the late 1960’s and early 1970's, where most of the pioneering work was carried out in the aerospace industry (3,4). In order to ensure the structural integrity of critical components it was becoming more evident that instead of asking the question ‘…what is the smallest flaw that can be detected by an NDT method?’ it was more appropriate, from a fracture mechanics point of view, to ask ‘…what is the largest flaw that can be missed?’ To elaborate on this point here, ultrasonic inspection data has been re-plotted from the ‘Non-destructive Testing Information Analysis Centre’ (NTIAC) capabilities data book (5). Figure 2

3

illustrates the detection capabilities of an ultrasonic surface wave inspection of two flat aluminium plates (thicknesses 1.5mm and 5.6mm), containing a total of 311 simulated fatigue cracks with varying depths. The flaws are recorded as detected (or hit) with PoD=1, or missed with PoD=0. Figure 2 shows three distinct regions separated by the lines asmallest (i.e. the smallest flaw detected) and alargest (i.e. the largest flaw missed). The region between asmallest and alargest shows that there are flaws of the same size which are sometimes detected and sometimes not detected. It is also clear that alargest is significantly larger than asmallest . In 1969, a program was initiated by the National Aeronautics and Space Administration (NASA) to determine the largest flaw that could be missed for the various NDT methods that were to be used in the design and production of the space shuttle. The methodology by NASA was soon adopted by the US Air Force as well as the US commercial aircraft industry. In the last two decades many more industries have adopted similar NDT reliability methods based on PoD. Some of these will be discussed in more detail in section 4 below. Early on in the mid-1970’s, a constant PoD for all flaw types of a given size was proposed and Binomial distribution methods were used to estimate this probability, along with an associated error or ‘lower confidence limit’ as it is often called (1). Whilst good PoD estimates could be obtained for a single flaw size, very large sample sizes were required to obtain good estimates of the ‘lower confidence limit’ (see section 3.4 below for more details on the confidence limit). It is clear from Figure 1, that this early assumption about a constant PoD for flaws of a given size, whilst making the probability calculations easier, was too simplistic as different detection percentages were being recorded for the same flaw size. In cases where there was an absence of large sample sizes, various grouping schemes were introduced to analyse the data, but in these cases estimates for the lower confidence limit were no longer valid. In the early to the mid-1980s, the approach was to assume a more general model for the PoD vs. flaw size ‘a’. Various analyses of data from reliability experiments on NDT methods indicated that the PoD (a) function could be modelled closely by either the cumulative 'log-normal' distribution or the 'log-logistic' (or ‘log-odds’) distribution (6). Both of these models will be discussed in more detail below. The statistical parameters (e.g. mean, median and standard deviation) associated with the PoD (a) functions can be estimated using standard statistical methods like 'maximum likelihood methods' (6) (see also Appendix A, section 2). 3.2. EXPERIMENTAL REQUIREMENTS TO PRODUCE POD CURVES

The ‘Recommended Practice’ (1), which was originally prepared for the aircraft industry, provides comprehensive information on the experimental sequence of events for generating data to produce PoD curves and to ‘certify’ (i.e. validate) an NDT method or procedure. The sequence of events can be broadly summarised as follows (see also (3)): • Manufacture or procure flaw specimens with the required large number of relevant flaw sizes

and flaw types • Inspect the flaw specimens with the appropriate NDT method • Record the results as a function of flaw size • Plot the PoD curve as a function of flaw size

However, before the manufacture or procurement of flaw specimens, it is necessary to make the following crucial decisions:

4

• What flaw parameter size will be used (e.g. flaw length or flaw depth)? • What overall flaw size range is to be investigated (e.g. 1mm to 9mm)? • How many intervals are required within the flaw size range to be investigated (e.g. if 6

intervals are selected for a 1mm to 9mm flaw size range, this implies a flaw width interval of 1.5mm)?

The recommended practice (1) also provides critical information on the necessary flaw sample size for each flaw width interval in order to demonstrate the desired PoD, along with an appropriate lower confidence limit, has been achieved. Usually it is not know before hand how large a flaw has to be before the desired PoD is satisfied and this can present problems in knowing the most appropriate flaw size range to select. Following the above experimental approach should lead to the largest flaw that can be detected with the desired PoD and confidence limit. It is important to appreciate that in selecting the sample size there are two distinct issues that have to be addressed. First, there is the issue of the sample size being large enough to achieve the desired PoD and confidence limit combination. Second, the sample size has to be large enough to be able to compute the statistical parameters associated with the PoD curve that best fits the data. It is believed that this distinction is not always made clear in the open literature. It may be of course that the sample size required to achieve the desired PoD/confidence limit combination is always sufficiently large to compute the statistical parameters for the PoD curve accurately enough (this is considered in more detail in section 3.3.3). 3.3. THE AVAILABLE PROBABILITY METHODS TO PRODUCE POD CURVES

In NDT reliability methods, there are two related probabilistic methods for analysing reliability data and producing PoD curves as functions of the flaw size a. Originally, NDT results were only recorded in terms of whether the flaw was detected or not (c.f. Figure 2). This type of data is called 'hit/miss' data and it is discrete data. This way of recording data is still appropriate for some NDT methods (e.g. penetrant testing or magnetic particle testing). However, in many NDT systems there is more information in the NDT response (e.g. peak voltage in eddy current NDT, the signal amplitude in ultrasonic NDT, the light intensity in fluorescent penetrant NDT). Since the NDT signal response can be interpreted as the perceived flaw size, the data is sometimes called â data (i.e. ‘a hat data’) or 'signal response' data and it is continuous data. Each type of data (i.e. hit/miss or signal response) is usually analysed using a different probabilistic model to produce the PoD (a) function. The details of the complete theoretical analysis is quite involved and beyond the scope of this report, but some details will be provided here and more information can be found in well referenced publications (4, 6). 3.3.1. PoD Curves for Hit/Miss Data

For hit/miss data a number of different statistical distributions were originally considered for the best fit (7). It was found that the log-logistic distribution was the most acceptable and the PoD (a) function can be written as:

a m

3

a m

3

ePoD a

1 e

!

"

!

"

#$ %& '( )

#$ %& '( )

=

+

ln

ln( ) (1)

5

where a is the flaw size and m and � are the median and standard deviation respectively. Another convenient form of equation (1) can be written as:

( )

( )

a

a

ePoD a

1 e

ln

ln( )

! "

! "

+

+=

+ (2)

and it is straightforward (see Appendix A, Section 3.2) to show that the parameters � and � are related to m and � by:

m!

"= # (3)

3

!"

#= (4)

From equation (2), it is straight forward to show that (see Appendix A, section 3.2):

PoD aa

1 PoD a

( )ln ln

( )! "

# $= +% &

'( ) (5)

The term on the left hand side is called the logarithm of the ‘odds’ (i.e. odds = probability of success/probability of failure) and equation (5) demonstrates that:

odds aln( ) ln! (6) hence the name ‘the log-odds model’ when applied to the hit/miss data. In Figure 1, it is evident that the log-odds PoD (a) function fits the particular hit/miss eddy current data well. Further evidence is given in Figure 3 where the linear relationship shown above in equation (6) is demonstrated (see also reference 6). The particular parameters � and � in Figure 3 (i.e. � = -2.9 and � = 1.69) were computed using maximum likelihood methods (6). The statistical parameters m and � can be calculated from equations (3) and (4). Recall the discussion above in section 3.1 regarding the detection probabilities of repeat inspections of the same flaw, as well as of different flaw types with the same size. The different detection probabilities result in a distribution of probabilities for some fixed flaw length (or flaw depth). The standard way of defining the distribution of these probabilities is through a 'probability density function' (see Appendix A, section 3.4). In the case of 'hit/miss' data the PoD (a) function is the mean of the probability density function for each flaw length or depth (Figure 4). 3.3.2. PoD Curves for Signal Response Data

For signal response data, much more information is supplied in the signal for analysis than is in the hit/miss data. In fact, as will be shown below, the PoD (a) function is derived from the correlation of â vs. a data. For signal response data it has been observed in a number of studies (6, 8) that an approximate linear relationship exists between ln(â) and ln(a). The relationship is usually expressed by:

6

1 1ln â ln(a)( ) ! " #= + + (7)

where � is an error term and is normally distributed with zero mean and constant standard deviation � � . The term � 1 + � 1 ln(a) in equation (7) is the mean � (a) of the probability density function of ln(â). In signal response data, a flaw is regarded as ‘detected’ if â exceeds some pre-defined threshold âth. Equation (7) is really expressing the fact that ln(â) is normally distributed with mean � (a) = � 1 + � 1 ln(a) and constant standard deviation � � (i.e. N( � (a), � �

2). The PoD (a) function for signal response data (i.e. ln(â)) can be expressed as:

thPoD a Probability (ln(â) > ln(â( ) ))= (8)

In other words, it is the area contained between the probability density function of ln(â) and above the flaw evaluation threshold ln(âth) (see Figure 5). Using standard statistical notation (9), equation (8) can be written as;

th 1 1ln(â ln(a))

PoD a 1 F) (

( )!

" #

$

% &' += ' ( )

( )* + (9)

where F is the continuous cumulative distribution function (see Appendix A, Section 3)). It is fairly straight forward to show that with the symmetric properties of the Normal distribution equation (9) can be written as (see Appendix A, Section 3):

( )( )( )

th 1 1

1

ln(a) ln(âPoD a F

) /( )

/!

" #

$ #

% &' '( )=( )* +

(10)

which is the cumulative log-normal distribution with:

th 1

1

ln âmean = a

( )( )

!µ

"

#= (11)

and

1

standard deviation = !"

"#

= (12)

The estimates for � 1, � 1 and � � are computed from the PoD data using the maximum likelihood method (6). 3.3.3. Sample Sizes

(a) To compute PoD parameters

7

For the hit/miss data, it has been shown in Figure 2 there is a flaw size range (i.e. asmallest, alargest) in which there is a definite uncertainty whether the inspection system will detect the flaw or not. On the other hand, if the flaw size a < asmallest the inspection system would be expected to miss the flaw. Similarly if a > alargest the inspection system would be expected to detect the flaw. So having a large number of very small or very large flaws will not provide much information on the PoD (a) function that will fit the data. To maximise the information required for estimating the PoD (a) function (i.e. the parameters) it is recommended that the flaw sizes be uniformly distributed between the minimum and maximum flaw size of interest. A minimum of 60 flaws is recommended for hit/miss data (6). For signal responses data, a direct consequence of the additional information means the range of flaw sizes is not as critical. The recommendation is a minimum of 30 flaws in the sample size (6). However, increasing the sample size will also increase the accuracy of the PoD (a) function estimate. (b) To achieve the desired PoD/Confidence limit combination

In practice, a PoD and lower confidence limit combination that is often quoted is 90% and 95% respectively (sometimes written 90-95). For the hit/miss NDT data discussed in the recommended practice (1), it is necessary to have a minimum sample of 29 flaws in each flaw width interval. This could be interpreted as 29 flaw specimens with one flaw in each specimen. This means that with 6 flaw width intervals, a minimum of 174 flaw specimens would be necessary (i.e. 174 flaws spread across the overall flaw range). So when published articles on this theme often refer to the ‘considerable’ cost associated with producing PoD curves experimentally, it is often understated. In addition, it is necessary to have the same number of ‘control’ specimens (specimens having no flaws) as flaw specimens, which are randomly mixed in with the flaw specimens before all the specimens are inspected. With such a large number of flaws, the requirement to compute the PoD (a) function parameters, as discussed above in (a), is easily satisfied. In order to achieve the 90% PoD with a 95% lower confidence limit for any flaw width interval, it is necessary to detect all the 29 flaws in that flaw width interval. For each flaw that is not detected in any particular flaw width interval, the recommended practice (1) provides tables of how many flaws in total need to be detected to achieve certification. There are also ‘maximum probability’ tables which indicate the probability of achieving certification after failing to achieve it at the first attempt or the second attempt and so on. Following any failure to certify, the decision as to whether it is economically viable to continue has to be considered very carefully and the maximum probability tables in the recommended practice are provided as assistance. A selection of maximum probability values, based on a 90-95 PoD and confidence limit combination are given in Table 1 of this report (see Appendix B of reference 1 for a more complete set of maximum probability tables). The experimental procedure for achieving the desired PoD/Confidence limit is equally applicable to the signal response data. 3.4. CONFIDENCE LIMITS (OR CONFIDENCE INTERVALS)

To obtain a better understanding of confidence limits in statistics, consider first an example using numerical integration (10). When we want to calculate the area under a curve for a function that is too complicated (or impossible) to carry out an exact integration, we need to compute a numerical integration. There are ‘error formulae’ in numerical integration where the maximum possible deviation or error can be calculated. Hence:

8

If we have an unknown exact value ‘e’ for the area and a known approximate value ‘A’ for the area, we will be able to calculate a maximum possible error, or deviation ‘� d’ from the error formulae. Hence we can say that:

A d e A d! " " + That is, ‘e’ lies between A - d and A + d with 100% certainty. In statistics however, a similar problem of estimating the true parameter ‘p’ of a population (e.g. the PoD), would require us to determine two numerical values ‘p1’ and ‘p2’, that depend on a particular random sample set and include ‘p’ with 100% certainty. However, from a sample set we cannot draw conclusions about the population with 100% certainty. We need to modify our approach since the numerical quantities p1 and p2 depend on the sample set and will be different for each random set. The interval with end points p1 and p2 is called a ‘confidence interval’. The concept of the confidence interval is usually expressed in the following way:

1 2P p p p C( )! ! = (13)

where C is called ‘the confidence level’. The point p1 is called ‘the lower confidence limit’ and the point p2 is called ‘the upper confidence limit’ (9, 10). For example, if we assign C to be 95%, what is the meaning of a ‘95% confidence interval for the population parameter p? To illustrate the point, let p be the mean of the population. Equation (13) is often wrongly interpreted as ‘there is a 95% probability that the confidence interval contains the population mean p’. However, any particular confidence interval will either contain the population mean or it won’t. The confidence level C does have this probability value associated with it, but it is not a probability in the normal usage, since p1 and p2 in equation (13) are not unique and are different for each random sample selected. The correct interpretation of equation (13) is based on repeated sampling. If samples of the same size are drawn repeatedly from a population and a confidence interval is calculated from each sample, then we can expect 95% of these different intervals to contain the true population mean. Formal definitions of terms associated with the confidence interval are provided in Appendix A, section 2 and an example is provided in Appendix A, section 2 of how the confidence interval is calculated for the population mean with a known standard deviation. 3.5. PUBLISHED WORK ON THE MODELLING OF POD

During the last two decades, the modelling of NDT capability has increased and improved substantially. The models are now being used as part of PoD studies to simulate the results of inspecting components with quite complex geometries. 3.5.1. An Overview

The savings in carrying out modelling of PoD, as opposed to the experimental determination of PoD, has been a strong motivation in the development of such models. The historical development of computational NDT and PoD models is discussed in some detail in a relatively recent NTIAC publication (11), covering the period from 1977-2001. The development of modelling PoD has focussed on NDT methods such as ultrasound, eddy currents, X-ray radiography

9

and numerous publications are cited in reference 11. Whilst the models have been used to produce PoD results for particular NDT methods and flaws, their other main contribution has been to optimise and validate the NDT procedures. During the 1990’s there were major research efforts in modelling NDT reliability and PoD from Iowa State University (USA) and the National NDT Centre, Harwell (UK). Two notable publications in the 1990’s were Thompson (12), which contained an updated review of the PoD methodology developed for the NDT of titanium components and Wall (13), which focussed on the PC-based models at Harwell and included corrections to PoD models due to human and environmental factors. Both the above publications are worthy of consideration for anyone wishing to start modelling PoD or to get a very good overview of the capabilities and usefulness of modelling PoD. A number of models discussed in reference 11 also consider the probability of false calls (or probability of false alarms (PFA)) and there is a good description of PFA in references (5) and (11). PFA will not be reported here as it is outside the scope of the project. 3.5.2. The PoD-generator (The Netherlands)

A recent NDT reliability model that is worth mentioning is the ‘PoD-generator’. This particular model was developed in the Netherlands as part of a joint industry project and presented at the 16th world conference on NDT 2004 (14). The model allows the assessment and optimisation of an inspection program for in-service components. The PoD-generator is really 3 models in one; the ‘degradation model’, which predicts the initiation and growth of flaws, the ‘inspection model’, which simulates the performance of the NDT method (i.e. currently it can deal with ultrasound or radiography) and the ‘integrity’ model’, which predicts the probability of failure. The degradation model passes information about the flaws to the inspection model, which in turn passes information about the inspection performance to the integrity model. A simple example of ultrasonic pulse-echo measurements to illustrate the concept of the PoD-generator is provided in reference (14). 3.5.3. Iowa State University (USA)

The main centre of excellence in the USA for PoD studies is almost certainly Iowa State University. In the field of modelling they have developed physically detailed models for predicting PoD. Some of their main collaborations in the USA have been, understandably, with the aerospace industry and the air force research laboratories. In fact, in the September 2005 NTIAC Newsletter, there was an interesting article on developments at Iowa State regarding PoD. The article reported that The Model Assisted PoD (MAPOD) Working Group has been established with the joint support of some major aerospace and air force research laboratories. The MAPOD approach is based on using modelling to determine PoD results in a way that reduces the need for the empirical approach, which can incur substantial costs and is usually slow to deliver results. More detailed information on MAPOD can be found in the September 2005 NTIAC Newsletter or by visiting the NTIAC website at www.ntiac.com. 3.5.4. National NDT Centre (UK)

The National NDT Centre (NNDTC) in the UK, which is now part of ESR Technology Ltd, holds a similar position in the UK and Europe on PoD as Iowa State holds in the USA. One of the major contributions of the NNDTC has been in the development of computer models for predicting PoD. However, they have also contributed to a number of national and international trials on PoD (e.g. USA ageing aircraft programme) as well as some high profile industrial applications of PoD (see Section 4).

10

On the modelling, there is the PoD for ultrasonic corrosion mapping (15), which predicts the PoD theoretically as well as by a simulation approach. Simulated images are brought up on the screen and the inspector can mark where flaws are seen, like a ‘spot the ball’ approach. The data is then analysed in terms of PoD and false calls. There are also PoD models originally developed for the European Space Agency (ESA), which deal with ultrasonic C-scanning and radiography of composite materials. The ESA work was reported at WCNDT 2000 (16). More recent work on modelling PoD includes the Magnetic Flux Leakage method in floor scanners and Eddy Currents for fastener inspection in airframe structures. There are also a number of other model applications, notably in the offshore industry and more specific information on these can be found on the NNDTC website at www.nndtc.com. 4. THE PRACTICAL APPLICATION OF POD CURVES

4.1. HOW POD CURVES ARE USED IN INDUSTRY

PoD curves provide reference to results that have been obtained for particular flaws using specific NDT procedures. However, it is important to appreciate that in using particular PoD curves for different applications that some validation of the NDT procedures is carried out. The POD curves provide important results for quantifying the performance capability of NDT procedures as well as the operators and could be used as a basis for:

• Establishing design acceptance requirements • NDT procedure qualification and acceptance • Qualification of personnel performance • Comparing the performance capabilities of NDT procedures • Selecting an applicable NDT procedure • Quantifying improvements in NDT procedures • Developing repeatable NDT data for fracture mechanics

The examples provided below of PoD applications to different industries link in well with the above uses of PoD curves (3). 4.2. PUBLISHED WORK ON POD CURVES IN DIFFERENT INDUSTRIES

The methodology of PoD reliability studies, developed in the late 60’s and early 70’s, for the aerospace industry has been adopted by a number of other industries and some of these will be discussed here. 4.2.1. Aerospace (NASA)

The first general requirements to quantify the capabilities of NDT methods came with the design and production of the NASA space shuttle system. In the past, the capability and reliability of routinely applied NDT procedures was assumed, but no one had produced any factual evidence. For example, knowing the smallest flaw detected by an NDT method was not much use, as there were many flaws larger than this smallest flaw that were missed. The flaw size which was more relevant was the largest flaw that could be missed (c.f. Figure 2). NASA initiated a research program in 1969 to determine the largest flaw that could be missed for the materials and NDT methods that were to be used in relation to the design and production of the space shuttle (1, 3). 4.2.2. Aircraft Structures, Inclusions in Titanium Castings

Childs et al (17) assessed X-rays radiography for the detection of ceramic inclusions in thick Titanium (Ti) castings used in aircraft structures. The castings were manufactured using the ‘Hot Isostatic Pressure (HIP) process. During the HIP process, the ceramic face coat can break into

11

splinters (or ‘spall’) and become embedded in the casting as ceramic inclusions called ‘shells’. The X-ray radiography results were analysed in terms of PoD as a function of shell diameter for different face coat formulations from different suppliers. The PoD results were used to improve the face coat formulations to improve detectability. 4.2.3. NORDTEST Trials

The NORDTEST programme (18) set out to compare manual ultrasonic NDT with X-ray radiography when applied to carbon manganese steel butt welds ≤ 25mm thick. The study was used to establish ‘acceptance curves’ as opposed to PoD curves. The acceptance curves defined acceptance probabilities vs. flaw height, where the acceptance probabilities were really 1 – PoD. The results of the NORDTEST trials demonstrated that there was an approximate relationship between certain ultrasonic NDT and radiographic NDT acceptance criteria (see also reference (19)). 4.2.4. Nuclear Components (The PISC Trials)

The Programme for the Inspection of Steel Components (PISC), carried out in the mid to late seventies (20), was concerned with the flaw detection capabilities of ultrasonic NDT on thick walled nuclear pressure vessel components (i.e. ~ 250mm). The ultrasonic NDT procedures used in the trials were applied too rigidly and did not allow the signal responses from large planar flaws to be evaluated properly. Hence, relatively low PoD’s were obtained for quite large flaws. This is a good example of poorly designed NDT procedures leading to unexpected and low PoD results (see also the discussion below in section 5.1). In the above PISC-I trials some of the inspectors were allowed to use their own preferred NDT procedures. This approach proved more effective and the PoD results were much higher for the same large flaws. In the PISC-II trials (21), the approach of using more flexible ultrasonic NDT procedures showed that the flaw characteristics (e.g. flaw shape, flaw geometry, orientation) had a relatively larger influence on the final PoD results compared to other parameters of the NDT procedures. 4.2.5. Offshore tubular Joints

The underwater PoD trials at University College London in the early 1990’s considered the detection of fatigue cracks in offshore tubular joints (22). The results of the trials were used to compare the flaw detection capabilities of Magnetic Particle Inspection (MPI) with a number of eddy current NDT techniques as well ultrasonic NDT techniques using creeping waves. For the techniques considered, the 90-95 PoD/Confidence limit combination was being achieved for cracks with typical lengths ≥ 100 mm. 4.2.6. Dutch Welding Institute (NIL)

The Dutch Welding Institute (Nederlands Instituut voor Lastechwick (NIL)) acts as a moderator of NDT in the Netherlands, but does not have its own experts in NDT. During the mid-1980’s to the mid-90’s NIL produced four reports based on four major joint industry projects (JIP), which were funded and carried out by Dutch industry. One of the JIP projects (23) involved assessing the reliability of mechanised ultrasonic NDT, in comparison with standard film radiography and manual ultrasonic NDT, for detecting flaws in thin steel welded plates (i.e. 6mm to 15mm).

12

There were 244 simulated, but realistic, flaw types such as lack of penetration, lack of fusion, slag and gas inclusions and cracks, which spanned 21 flat welded test plates. Some of the main conclusions were: • Mechanised ultrasonic NDT (i.e. mechanised pulse-echo and time of flight diffraction (TOFD)),

performed better than manual ultrasonic NDT with respect to flaw detection capability. • Mechanised ultrasonic NDT was better at flaw sizing than manual ultrasonic NDT. • Double exposure weld bevel radiography performed than 00 film radiography. • The detection performance did not depend on the wall thickness in the range 6mm to 12mm.

Some of the PoD values associated with this particular flaw population, and the 6mm to 12mm plates, are as follows:

NDT Methods PoD Values (%) Mechanised ultrasonic and TOFD 60-80 Manual ultrasonic NDT 50 00 film radiography 65 Double exposure weld bevel radiography 95 False calls 10-20

It is always important in these kinds of studies not to draw too many general conclusions but simply accept the results for this particular set of flaw specimens. The results of this particular NIL study on PoD, along with the NORDTEST (18), PISC (20, 21) and the underwater trials at UCL (22), are reviewed in more detail in an HSE report with a focus on offshore technology (24). 4.2.7. Railways (National NDT Centre (UK))

During the last 5 years the NNDTC has worked with the UK rail industry’s main line and London underground to improve and quantify the reliability of inspection. This has included looking at the reliability of ultrasonic near-end and far-end scan methods used on axles as well as other work on bogie frames, wheel sets and train structures. There has been work also on the rail infrastructure including rail inspection and edge-corner cracking issues and electromagnetic modelling. POD is commonly used in the rail industry to quantify reliability and to optimise the inspection periodicity using probabilistic methods. NNDTC has produced improved estimates of POD for ultrasonic axle inspection. POD estimates have also been produced for the improved NDE methods and for new designs utilising hollow axles. More recently, the NNDTC has developed a simulation model utilising real A-Scan data and data from real flaws to produce POD curves for far-end and near end axle inspection. This enables specific POD curves to be produced for individual axle designs and geometries. The location and sizes of the cracks can be altered and the effect of geometric features on detectability evaluated. There has been a lot of interest in the industry in improved methods for NDT measurement of bogie frames and NNDTC has been heavily involved in this, particularly for inspecting less accessible parts of the bogie. This work included POD trials on manual ultrasonic inspection of welds in bogie frames (25).

13

4.2.8. LPG Storage Vessels

The extent of non-invasive inspection of Liquid Petroleum Gas (LPG) storage vessels has been considered previously by Georgiou and a probabilistic model was devised for optimising flaw detection and a number of reports and papers were published (26-29). A guidelines document (28) was written to assist companies and HSE inspectors to assess how much NDT was required in order to achieve a desired probability of detecting a flaw and was based on a concept called the ‘index of detection’ (IoD). The IoD was related to ‘Probability of Inclusion’ (PoI) curves and to a particular PoD (a) curve (i.e. for ultrasonic NDT), which was kindly provided by NNDTC from a particular PoD study (30). Since the work by Georgiou, some HSE inspectors have considered the PoI curves as well as the IoD results in the guidelines document (28). It was considered timely to assess their comments as well as pull together all the statistical models considered so far, validate them against real data using appropriate statistical techniques and select the best available model. The additional work on the PoI curves and the IoD have been completed alongside this PoD work and are provided as two self contained reports in Appendices C and D respectively. The updated work is now considered to have wider applications than just the ultrasonic NDT of LPG storage vessels. 5. THE LIMITATIONS OF APPLYING POD CURVES

5.1. COMMENTS ON HIT/MISS DATA AND SIGNAL RESPONSE DATA

Whilst the approaches to determine the PoD (a) function for hit/miss data and the signal response data are quite different, the log-odds and cumulative log-normal distribution functions are very similar for the same statistical parameters. Figure 6 shows a comparison between the log-odds and cumulative log-normal for � =0 and � =1 (6). On occasions the behaviour of the PoD data may appear illogical and the PoD (a) function selected (e.g. log-odds or cumulative log-normal) may not fit the data. It may be of course that other modelling approaches need to be considered (31, 32). However, it is useful to carry out some quick checks to see if there is something specific about the data in order to decide what action to take. In the case of hit/miss data it has been observed that the PoD (a) function can sometimes decrease with flaw size (i.e. large flaws are missed more than small ones). This is usually because the NDT experiment was poorly designed and it would require a repeat some of the trials with better designed NDT procedures. There was a good example of this in the PISC-I trials discussed above in section 4.2.4. There are also a number of examples of this in the NTIAC data book (5) and a particular one is provided in Figure 7, simply to illustrate how the PoD curve can behave for such a case. Regions of flaw hits and flaw misses should not be distinct, there has to be a good overlap (c.f. Figure 2), otherwise the analysis that fits the log-odds model will not produce a valid solution (6). This usually means more data is required in the region between asmallest and alargest (c.f. Figure 2). It is also possible to produce what appears to be an acceptable PoD (a) function that fits the data well, but the confidence limit decreases with increasing flaw size. This is usually evidence that the log-odds model is not a good fit. This behaviour is usually associated with extreme values of � and � (e.g. large � and small � ). In signal response data, there is less reliance on the overlap of flaw size range and more emphasis on the linear relationship between ln(â) and ln(a). When the relationship is not linear, the cumulative log-normal will not fit the data. This is usually associated with unreasonably values for � and � and the lower confidence limit will eventually decrease with increasing flaw size (similar to that

14

observed with the hit/miss data). When these situations occur, it is worth checking that the NDT experiment was designed and executed properly. Failing that, it is likely that a different model needs to be investigated (32). 5.2. IMPORTANT OPERATING AND PHYSICAL PARAMETERS

In the Recommended Practice (1), operating parameters for each of five NDT methods are provided (i.e. Ultrasound, Eddy Currents, Penetrants, Magnetic Particle and Radiography). Each method has a very detailed list of both operator controlled parameters, relating to the NDT method, and physical parameters associated with the specimen and flaws. The parameters for each NDT method are too numerous to repeat here, but different NDT methods will be considered to assess the differences in their respective PoD curves. In addition, the effects on the PoD curves from material properties, the specimen geometry and the flaw characteristics will also be considered. The NTIAC data book (5) provides information on 423 PoD curves covering eight NDT methods (i.e. the five mentioned in the Recommended Practice (1) above as well as visual testing and two so called emerging NDT methods, which are Holographic Interferometry and ‘Edge of Light’ inspection). In this report, the NTIAC data book was used as the prime source of raw PoD data and these data have been used to assess how the PoD curves are affected by the various operational and physical parameters in the sub-headings below. It is important to note that the NTIAC data book contains only hit/miss data and in each of the 423 PoD curves it is the log-odds model that has been used to fit the data (using a 95% confidence limit). In all cases the actual flaw dimensions have been verified by destructive analysis and measurement. In order to assess the effects of the operational and physical parameters, it was necessary to find PoD data where only one of the operational or physical parameters changed while the other parameters remained the same. This was not always completely clear, as there were always some uncertainties, notwithstanding the data sheets indicating which parameters were nominally the same and which were different. The examples selected cover a range of NDT methods and help to illustrate the kind of differences that can exist between PoD results, but without any deliberate attempt to maximise these differences. Before observing the effects of certain operational and physical parameters on the PoD results, it is important to note the following points: • The PoD data in the NTIAC data book (5) was collected about 30 years ago and may not

necessarily reflect current capabilities with modern digital instrumentation. • The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in

question. It would be wrong to draw general conclusions about PoD values (e.g. ultrasound is better than X-ray). The figures merely serve to illustrate the possible effects that the physical and operational parameters can have on the PoD and that we should be aware of these effects when quoting PoD results.

• Equipment ‘calibration’ is also one of the important variables in the application of an NDT procedure. It is believed that no attempt was made to resolve calibration issues in collecting the inspection data.

• The designated operators A, B and C recorded in the datasets are not necessarily the same 3 persons each time.

Notwithstanding the above points, the PoD datasets in the NTIAC data book (5) are considered a rich, comprehensive and valid set of data, which would almost certainly be prohibitively expensive

15

to repeat by any one organisation using more modern digital technology. Such data does not appear to exist elsewhere in such an easily accessible and consistent format in which to illustrate the comparisons below. 5.2.1. NDT Method

To illustrate the differences in the PoD curves that can occur for different NDT methods, the NTIAC data book was used to identify the NDT carried out on the same flaw specimen and by the same designated operator. Whether the designated operator (e.g. operator C) is precisely the same person in each case is not absolutely clear. However, it is believed that the cases selected offer a reasonable independent measure of the differences. Two Titanium flat plates (i.e. thicknesses 1.7mm and 5.7mm) with a total of 135 cracks were inspected by the same designated operator using; manual eddy currents, manual ultrasound (surface waves) and X-ray radiography. The PoD curves for each method are plotted in Figure 8 and show the differences in the PoD curves for a particular dataset. In particular, the flaw size corresponding to the 90% PoD varies significantly for each NDT method (i.e. 3.4mm, 14.8mm and 18.5mm for ultrasound, eddy currents and X-ray radiography respectively). 5.2.2. Fluorescent Penetrant NDT

In the case of fluorescent penetrant NDT, two datasets were considered which quantify the differences in PoD between the cases of no developer and developer being used to reveal surface flaws (Figure 9). Best practice cleaning procedures were followed between inspections. Whilst surface lengths were measured, the depths were predicted from validated crack growth procedures. The PoD differences in Figure 9 for this particular flaw specimen (i.e. Haynes 188 alloy (AMS 5608A, with 125 RMS and with dimensions 3.5in x 16in x 0,19in) are quite large, with the 90% PoD not being achieved without the developer. 5.2.3. Material Properties

To assess the differences in PoD for different material being inspected with the same NDT method, the same size flaw specimen with the same nominal flaws had to be found. Clearly, this was not going to be 100% possible with different material specimens. However, by a close examination of the datasheets, which accompanying each dataset, it was possible to find flaw specimens that were produced in the same way with the intention of producing the same flaw types. The three PoD results illustrated in Figure 10 are for three different materials (i.e. aluminium, titanium and steel). The datasheets for these three datasets suggest that the only physical difference is the material, although the width of the steel plate is different from the other two. The thickness of all three is the same and they are in the same ‘as machined’ state. The flaws were all initiated using the same mechanism and cover a similar length range, but clearly the flaws in each different material specimen will not be identical. It is worth mentioning that these particular steel PoD results improved dramatically once the specimen went beyond the ‘as machined’ state (e.g. etching and proof loading). 5.2.4. Specimen Weld Geometry

The aim here was to look for PoD data that had different weld conditions. Since the NTIAC data book has considered mainly flat panel specimens and bolt holes, there was not a V-butt weld to compare with J-prep weld, for example. The closest was a particular comparison for the same flaw

16

specimen, but with a different condition of the weld. The PoD results for aluminium welds with crowns, was compared to PoD results with the welds ground flush. The NDT method used was X-ray radiography and the results are illustrated in Figure 11. The differences in PoD are relatively small and the 90% PoD was not achieved in both cases, although for the ‘welds ground flush’ PoD, the 90% PoD was very close to 0.75in (~19mm). 5.2.5. Flaw Characteristics

To illustrate the possible effect on the PoD from inspecting different flaws, fluorescent penetrant NDT was considered for longitudinal cracks and transverse cracks covering the same flaw length range. The datasets for two specimens were found that were physically the same apart from the flaws. The PoD results for this comparison are illustrated in Figure 12. The transverse flaws are associated with a much lower PoD values for relatively smaller crack lengths (i.e. below about 0.15in ( ~ 4mm)), but the PoD results are more similar for larger crack lengths (i.e. above about 0.25in (~ 6mm)), beyond which the PoD results both converge to unity. 5.2.6. Human Reliability

This is an area which has been researched extensively in the UK at the NNDTC (c.f. reference 13). It would be very easy to show some quite startling differences in detection capability based on human reliability studies. In manual ultrasonic NDT, for example, an often quoted anecdote is ‘you can only believe a manual ultrasonic NDT result 50% of the time’. Perhaps this originated from typical differences observed in the past between two operators in various detection trials (c.f. the NIL study (23) discussed in section 4.2.6). The ‘50% anecdote’ is believed to be too simplistic for many situations and more information needs to be considered. The NTIAC data book (5) does contain a great deal of data where the physical and operational parameters are the same, for a particular NDT method, and the only difference is the operator. However the data book did not set out to study human reliability. Nevertheless, it does appear that for nearly every datasheet there are 3 PoD results (i.e. according to operators A, B, C). There are a number of cases where the differences in the PoD results are significant and others where the PoD results are very similar. The cases selected for illustrating operator variability here are illustrated in Figure 13 for ultrasonic immersion NDT, inspecting titanium plates with low cycle fatigue cracks. Whilst the results appear similar, and the same order of flaws is missed by each operator, the 90% PoD varies by at least a factor of 2 (c.f. operator A with operators B and C) 6. DISCUSSION

This section brings together all the salient features of this study. It is much more than an Executive Summary and is aimed particularly at HSE inspectors who want a reasonably quick understanding of PoD without having to read the whole report. 6.1. INTRODUCTION

There is a large amount of ‘Probability of Detection’ (PoD) information available; starting with the pioneering work in the late 1960’s to early 1970’s for the aerospace industry, to more recent and more general industrial applications.

17

PoD curves have been produced for a range of Non-Destructive Testing (NDT) methods (e.g. ultrasound, radiography, eddy currents, fluorescent penetrants). Whilst it is reasonable to assume that each NDT method will produce different PoD curves (even when applied to the same flaws), it is believed that many who use PoD curves do not fully appreciate how significant the differences can be. PoD curves are also dependent on a number of physical and operational parameters. It is important to understand how PoD curves are derived and to question the validity of their application, as well as to appreciate their limitations. This discussion aims to provide this information as well as considering the relevance of PoD to fitness for service issues. 6.2. AIMS AND OBJECTIVES

The overall goal is to provide concise and understandable information on PoD curves, which Health and Safety Inspectors should find useful when discussing safety cases involving PoD curves. The specific objectives are: • To provide a clear and understandable description of how PoD curves are derived. • To provide practical applications of PoD curves, particularly to fitness for service issues. • To quantify the limitations of PoD curves.

6.3. HISTORICAL DEVELOPMENT

An early definition of NDT reliability is, 'the probability of detecting a crack in a given size group under the inspection conditions and procedures specified' (1). The PoD is usually expressed as a function of flaw size (i.e. length or depth), although it is a function of many physical and operational parameters. Repeat inspections of the same flaw size or the same flaw type do not necessarily result in consistent hit or miss indications. There is a spread of detection results, which is why the detection capability is expressed in terms of the PoD. An early example is illustrated by Lewis et al (2), where 60 air force inspectors, using the same surface eddy-current technique, inspected 41 cracks around fastener holes. The results in Figure 1 show that the chances of detecting the cracks increase with crack size. However, none of the cracks were detected 100% of the time and different cracks of the same size have different detection percentages. Figure 1 also shows that the ‘log-odds’ distribution is a good fit to the data and illustrates why PoD is an appropriate measure of detection capability. PoD functions have been the subject of many studies since the late 1960’s and early 1970's, where most of the work was carried out in the aerospace industry (3, 4). It was becoming clear that to ensure the structural integrity of critical components, the question ‘…what is the smallest flaw that can be detected by an NDT method?’ was less appropriate than the question‘…what is the largest flaw that can be missed?’ To elaborate on this, some real ultrasonic inspection data has been considered from the ‘Non-destructive Testing Information Analysis Centre’ (NTIAC) capabilities data book (5). Figure 2 illustrates the detection capabilities of an ultrasonic surface wave inspection of two flat aluminium plates, containing a total of 311 simulated fatigue cracks with varying depths. The flaws were recorded as detected (or hit) with PoD=1, or missed with PoD=0.In Figure 2, there are three distinct regions separated by the lines asmallest and alargest. The region between asmallest and alargest shows flaws of the same size, which are hit and missed, and alargest is much larger than asmallest. In 1969, a program was initiated by the National Aeronautics and Space Administration (NASA) to determine the largest flaw that could be missed, for various NDT methods to be used in the design and production of the space shuttle. The methodology by NASA was soon adopted by the US Air

18

Force as well as the US commercial aircraft industry. In the last two decades many more industries have adopted similar NDT reliability methods based on PoD. Some of these will be discussed below. Early on in the mid-1970’s, a constant PoD for all flaw types of a given size was proposed and Binomial distribution methods were used to estimate this probability, along with an associated error or ‘lower confidence limit’ (1). It is clear from Figure 1, that this early assumption about a constant PoD for flaws of a given size was too simplistic. In the early to the mid-1980s, the approach was to assume a more general model for the PoD vs. flaw size ‘a’. Various analyses of data from reliability experiments on NDT methods indicated that the PoD (a) function could be modelled closely by either the 'log-logistic' (or ‘log-odds’) distribution or the cumulative 'log-normal' distribution (6). 6.4. FLAW SAMPLE SIZES FOR ‘HIT/MISS’ DATA AND ‘SIGNAL RESPONSE’ DATA

The ‘Recommended Practice’ (1), originally prepared for the aircraft industry, provides comprehensive information on the experimental sequence of events for generating data to produce PoD curves and to ‘certify’ (i.e. validate) an NDT method or procedure. The sequence of events can be broadly summarised as follows (see also (3)): • Manufacture or procure flaw specimens with a large number of flaw sizes and flaw types • Inspect the flaw specimens with the appropriate NDT method • Record the results as a function of flaw size • Plot the PoD curve as a function of flaw size

However, before the manufacture or procurement of flaw specimens, it is necessary to ask: • What flaw parameter size will be used (e.g. flaw length or flaw depth)? • What overall flaw size range is to be investigated (e.g. 1mm to 9mm)? • How many intervals are required within the flaw size?

The recommended practice (1) also provides critical information on the flaw sample size, for each flaw width interval, in order to achieve the desired PoD and the appropriate lower confidence limit. It is important to appreciate that in selecting the sample size there are two distinct issues to address. First, the sample size has to be large enough to achieve the desired PoD and confidence limit combination. Second, the sample size has to be large enough to determine the statistical parameters, associated with the PoD curve that best fits the data. Originally, NDT results were always recorded in terms of ‘hit/miss’ data (c.f. Figure 2), which is discrete data. This way of recording data is still appropriate for some NDT methods (e.g. magnetic particle testing). However, in many inspections there is more information in the NDT response (e.g. the light intensity in fluorescent NDT). Since the NDT signal response can be interpreted as the perceived flaw size, the data is often called â, that is, ‘a hat’ or ‘signal response’ data, which is continuous data. 6.4.1. Model for Hit/Miss Data

For hit/miss data a number of different statistical distributions have been considered (7). It was found that the log-logistic distribution was the most acceptable and the PoD (a) function can be written as;

19

a m

3

a m

3

ePoD a

1 e

!

"

!

"

#$ %& '( )

#$ %& '( )

=

+

ln

ln( )

where a is the flaw size and m and � are the median and standard deviation respectively. Another convenient form of the above equation can be written as:

( )

( )

a

a

ePoD a

1 e

ln

ln( )

! "

! "

+

+=

+

and it is straight forward to show that:

PoD aa

1 PoD a

( )ln ln

( )! "

# $= +% &

'( )

where m!

"= # and

3

!"

#=

Hence the name ‘log-odds’ (i.e. odds = probability of success/probability of failure, c.f. Figure 1) and

odds aln( ) ln! 6.4.2. Model for Signal Response Data

For signal response data it has been observed (6, 8) that an approximate linear relationship exists between ln(â) and ln(a), where a is the flaw size. The relationship is often expressed by:

1 1ln â ln(a)( ) ! " #= + +

where � is an error term and is normally distributed with zero mean and constant standard deviation � � . The above relationship is expressing the fact that ln(â) is normally distributed with mean � (a) = � 1 + � 1 ln(a) and constant standard deviation � � (i.e. N( � (a), � �

2). The PoD (a) function for signal response data (i.e. ln(â)) can be expressed as:

thPoD a Probability (ln(â) > ln(â( ) ))=

where ln(âth)is the flaw evaluation threshold. Using standard statistical notation (9), the PoD for signal response data can be expressed as:

th 1 1ln(â ln(a))

PoD a 1 F) (

( )!

" #

$

% &' += ' ( )

( )* +

where F is the continuous cumulative distribution function. It is straight forward to show using the symmetric properties of the Normal distribution (9) that:

20

ln(a)

PoD a Fµ

!

"# $= % &

' (( )

which is the cumulative log-normal distribution where

the mean th 1

1

ln âa

!µ

"

#=

( )( ) and standard deviation

1

!""

#= .

The estimates for � 1, � 1 and � � are computed using ‘maximum likelihood’ methods (6). 6.4.3. To Compute PoD parameters

In order to determine the parameters associated with the PoD (a) function, for hit/miss data, it is recommended that the flaw sizes be uniformly distributed between the minimum and maximum flaw size of interest, with a minimum of 60 flaws(6). For signal responses data, a direct consequence of the additional information means the range of flaw sizes is not as critical. The recommendation is a minimum of 30 flaws in the sample size (6). 6.4.4. To Achieve the Desired PoD/Confidence Limit Combination

In practice, a PoD and lower confidence limit combination often quoted is 90% and 95% respectively. For hit/miss and signal response NDT data (1), it is necessary to have a minimum sample of 29 flaws in each flaw width interval. This could be interpreted as 29 flaw specimens with one flaw in each specimen. This means that if 6 flaw width intervals were used, a minimum of 174 flaw specimens would be necessary, a considerable cost to produce PoD curves experimentally. With such a large number of flaws, the requirement to compute the PoD (a) function parameters, as discussed above in (a), is easily satisfied. 6.5. POD MODELLING

During the last two decades, the modelling of NDT capability has increased and improved substantially. The savings in carrying out modelling of PoD, as opposed to the experimental determination of PoD, has been a strong motivation in developing models. The historical development of computational NDT and PoD models is discussed in some detail in a relatively recent NTIAC publication (11), covering the period from 1977-2001. The development of modelling PoD has focussed on NDT methods such as ultrasound, eddy currents, X-ray radiography and numerous publications are cited in reference 11. During the 1990’s there were major research efforts in modelling NDT reliability and PoD from Iowa State University (USA) and the National NDT Centre, Harwell (UK). Two notable publications in the 1990’s were Thompson (12), which contained an updated review of the PoD methodology developed for the NDT of titanium components and Wall (13), which focussed on the PC-based models at Harwell and included corrections to PoD models due to human and environmental factors. Both the above publications are worthy of consideration for anyone wishing to start modelling PoD or to get a very good overview of the capabilities and usefulness of modelling PoD.

21

In recent years both Iowa State and the National NDT Centre (NNDTC) have continued developing models to determine PoD results. Iowa State has established the Model Assisted PoD (MAPOD) working group, with the joint support of some major aerospace and airframe research laboratories. The NNDTC, which is now part of ESR Technology Ltd, have continued to develop some interesting PoD models for composites (16), magnetic flux leakage in floor scanners and other PoD applications in the offshore industry (see www.nndtc.com ). A recent interesting NDT reliability model is the ‘PoD-generator’ (14). The model allows the assessment and optimisation of an inspection program for in-service components using ultrasound and radiography. 6.6. PRACTICAL APPLICATIONS OF POD

The methodology of PoD reliability studies, developed in the 60s and 70s, for the aerospace industry has been adopted by a number of other industries and some of these are discussed below. 6.6.1. Aircraft Structures, Inclusions in Titanium Castings

Childs et al (17) assessed X-rays radiography for the detection of ceramic inclusions in thick Titanium (Ti) castings for aircraft structures. The castings were manufactured using the ‘Hot Isostatic Pressure (HIP) process. During this process, the ceramic face coat can break into splinters and become embedded in the casting as inclusions (i.e. ‘shells’). The X-ray radiography results were analysed in terms of PoD of the shell diameter, for different face coat formulations, and the results were used to improve the face coat formulations and also improve detectability. 6.6.2. NORDTEST Trials

The NORDTEST trials (18) set out to compare manual ultrasonic NDT with X-ray radiography when applied to carbon manganese steel butt welds ≤ 25mm thick. The trials were used to establish ‘acceptance curves’, which defined acceptance probabilities (i.e. 1-PoD) against flaw height. The results of the NORDTEST trials demonstrated that there was an approximate relationship between certain ultrasonic NDT and radiographic NDT acceptance criteria (see also reference (19)). 6.6.3. Nuclear Components (The PISC Trials)

The Programme for the Inspection of Steel Components (PISC), carried out in the mid to late seventies (20), considered flaw detection capabilities of ultrasonic NDT on thick nuclear pressure vessel components (i.e. ~ 250mm). The ultrasonic NDT procedures were applied too rigidly and signal responses from large planar flaws were not evaluated properly, resulting in low PoD values. However, some of the inspectors were also allowed to use their own preferred NDT procedures, which proved more effective and the PoD results were much higher for the same large flaws. In the PISC-II trials (21), the approach of using more flexible ultrasonic NDT procedures showed that the flaw characteristics (e.g. flaw shape, flaw geometry, orientation) had a relatively larger influence on the final PoD results compared to other physical parameters. 6.6.4. Offshore Tubular Joints

The underwater PoD trials at University College London in the early 1990’s considered the detection of fatigue cracks in offshore tubular joints (22). The results of the trials were used to compare the flaw detection capabilities of Magnetic Particle Inspection (MPI) with a number of eddy current NDT techniques as well ultrasonic creeping wave NDT techniques. The 90-95 PoD/Confidence limit combination was being achieved for cracks with typical lengths ≥ 100 mm.

22

6.6.5. Dutch Welding Institute (NIL)

During the mid-90s, the Dutch Welding Institute (Nederlands Instituut voor Lastechwick (NIL)) produced a report (23) involving, amongst other NDT methods, the reliability of mechanised ultrasonic NDT for detecting flaws in thin steel welded plates (i.e. 6mm to 15mm). Some of the main conclusions for ultrasonic NDT were:

• Mechanised ultrasonic NDT and time of flight diffraction (TOFD), had a higher flaw detection capability than manual ultrasonic NDT (i.e. PoD of 60%-80% compared to 50% respectively).

• Mechanised ultrasonic NDT was better at flaw sizing than manual ultrasonic NDT.

An HSE report, focussing on offshore technology (24), has carried out a detailed reviewed on the NORDTEST trials, the PISC trials, the underwater trials at UCL and the NIL PoD study. 6.6.6. Railways

During the last 5 years the NNDTC has worked with the UK rail industry’s main line and London underground to improve and quantify the reliability of inspection. PoD methods are commonly used in the rail industry to quantify reliability and to optimise inspection periodicity. The NNDTC has developed a simulation model utilising real A-Scan data and data from real flaws to produce POD curves for far-end and near end axle inspection. NNDTC has also been heavily involved in POD trials on manual ultrasonic inspection of welds in bogie frames (25). 6.6.7. LPG Storage Vessels

The NDT of LPG storage vessels was considered by Georgiou and a probabilistic model for optimising flaw detection was developed. The reports and papers published (26-29), included a guidelines document (28) written to assist companies and HSE inspectors to assess the amount NDT required order to achieve a desired PoD and used a concept called the ‘index of detection’ (IoD). In the meantime, some HSE inspectors have considered the IoD work. It was timely to assess their comments as well as pull together the statistical models considered so far, validate them against different data using statistical techniques and select the best available model. This additional work has been completed alongside this PoD study and is now considered to have wider applications than just the ultrasonic NDT of LPG storage vessels (c.f. Appendices C and D). 6.7. DEPENDENCE OF POD ON OPERATIONAL AND PHYSICAL PARAMETERS

6.7.1. Important Operational and Physical Parameters

The NTIAC data book (5) has been the prime source of raw PoD data (i.e. 423 PoD curves) for assessing the effects on PoD curves by the operational and physical parameters. The NTIAC data book contains only hit/miss data and in each of the 423 PoD curves it is the log-odds model that has been used to fit the data, along with a 95% confidence limit. To assess the effects of the parameters, PoD data was considered where only one of the parameters changed while the other parameters remained the same. The examples selected cover a range of NDT methods and help to illustrate the kind of differences that can exist between PoD results, but without any deliberate attempt to maximise these differences. Before observing the effects of certain parameters on the PoD curves, it is important to note the following points:

23

• The PoD data in the NTIAC data book was collected about 30 years ago and may not necessarily reflect current capabilities with modern digital instrumentation.

• The PoD data illustrated in the figures to follow are valid for the particular datasets in question. It would be wrong to draw general conclusions about PoD values. The figures merely serve to illustrate the possible effects that the parameters can have on the PoD and that we should be aware of these effects when quoting PoD results.

• Equipment ‘calibration’ is an important variable in the application of an NDT procedure. It is believed that no attempt was made to resolve this issue in collecting the inspection data.

• The designated operators A, B and C recorded in the datasets are not necessarily the same 3 people each time.

Notwithstanding the above points, the PoD datasets in the NTIAC data book are a rich, and comprehensive set of data, which would almost certainly be prohibitively expensive to repeat by any one organisation using more modern digital technology. Such data does not appear to exist elsewhere in such an easily accessible and consistent format to illustrate the comparisons below. 6.7.2. NDT Method

Two Titanium flat plates (i.e. thicknesses 1.7mm and 5.7mm) with a total of 135 cracks were inspected by the same designated operator using manual eddy currents, manual ultrasound (surface waves) and X-ray radiography. The PoD curves for each method are plotted in Figure 8 and show the differences in the PoD curves for a particular dataset. 6.7.3. Fluorescent Penetrant NDT

Two datasets were considered which quantify the differences in PoD between the cases of no developer and developer being used to reveal surface flaws (Figure 9). Whilst surface lengths were measured, the depths were predicted from validated crack growth procedures. 6.7.4. Material Properties

The three PoD results illustrated in Figure 10 are for aluminium, titanium and steel. The datasheets for these datasets suggest the only physical difference is the material, although the width of the steel plate is different from the other two. The thickness of all three is the same and they are in the same ‘as machined’ state. The flaw types were all initiated using the same mechanism. However, the flaws in each different material specimen are clearly not identical. It is worth noting that the particular PoD results for steel improves dramatically once the specimen goes beyond the ‘as machined’ state (e.g. etching and proof loading). 6.7.5. Specimen Weld Geometry

The NTIAC data book contains data on flat panel specimens and bolt holes, with no V-butt welds to compare with J-prep welds, for example. However, there are particular PoD results for aluminium welds with crowns and PoD results with the same aluminium welds ground flush. The NDT method used was X-ray radiography and the results are illustrated in Figure 11. 6.7.6. Flaw Characteristics

Fluorescent penetrant NDT was considered for inspecting longitudinal cracks and transverse cracks covering the same flaw length range. The PoD results for this comparison are illustrated in Figure 12. The transverse flaws are associated with much lower PoD values for relatively smaller crack lengths (i.e. below about 4mm), but are more similar for larger crack lengths (i.e. above about 6mm).

24

6.7.7. Human Reliability

It would be very easy to show significant differences in detection capability based on human reliability studies. In manual ultrasonic NDT an often quoted anecdote is, ‘you can only believe a manual ultrasonic NDT result 50% of the time’. Perhaps this originated from typical differences observed in the past (c.f. the NIL study (23)). The ‘50% anecdote’ is believed to be too simplistic for many situations and more information is required. The NTIAC data book does contain a great deal of data where the only difference is the operator (i.e. operators A, B, C). The cases selected for illustrating operator variability here are illustrated in Figure 13 for ultrasonic immersion NDT, inspecting titanium plates with low cycle fatigue cracks. 7. INDEPENDENT VERIFICATION

The independent verification was carried out by George A Georgiou (GAG), Emilie Beye (EB), and Melody Drewry (MD). Whilst GAG is the author of this report, there were specific aspects of the statistical theory and calculations in Appendix C that he did not carry out and particular checks were carried out in relation to the formal statistical tests and the datasets used in those tests. Similarly, EB and MD, who are co-authors of Appendix C, were not involved at all in the main study and numerous checks were carried out by them in the following areas: • Derivation of the mathematical equations • The statistical definitions and terminologies in Appendix A • The figures and the corresponding datasets (i.e. Figure 2 and Figures 7 – 13) • The comparison of the Probability of Inclusion curves in Appendix C • The updated Index of Detection model in Appendix D • The Conclusions and Recommendations

A formal verification statement is made in section 12. 8. CONCLUSIONS

• The ‘log-odds’ distribution is found to be one of the best fits for hit/miss NDT data. • The log-normal distribution is found to be one of the best fits for signal response NDT data, and

in particular for flaw length and flaw depth data as determined by ultrasonic NDT. • In some cases, the ‘log-odds’ and cumulative log-normal distributions are very similar, but there

are many cases where they are significantly different. • There are NDT data when neither the ‘log-odds’ nor the log-normal distributions are appropriate

and other distributions need to be considered. • There is often a large gap between the smallest flaw detected and the largest flaw missed. • Very small or very large flaws do not contribute much to the PoD analysis of hit/miss data. • To achieve a valid ‘log-odds’ model solution for hit/miss data, a good overlap between the

smallest flaw detected and the largest flaw missed is necessary. • To achieve a valid log-normal model solution for signal response data, there is less reliance on

flaw size range overlap, but more on the linear relationship between ln(â) and ln(a). • When the PoD (a) function decreases with increasing flaw size, it is usually an indication that

the NDT procedures are poorly designed. • When the lower confidence limit decreases with increasing flaw size, notwithstanding an

acceptable PoD (a) function, it is usually associated with extreme or unreasonable values of the mean and standard deviation.

• The effect on PoD results for particular operational and physical parameters can be significant for datasets selected from the NTIAC data book of PoD curves.

25

• The PoD data in the NTIAC data book were collected some 30 years ago and may not necessarily reflect current capabilities with modern digital instrumentation. However, the results are still believed to be relevant to best practice NDT.

• The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in question. It would be wrong to draw too many general conclusions about the particular PoD values (e.g. ultrasound is better than X-ray).

• Figures 7 - 13 serve to illustrate the possible effects that the physical and operational parameters can have on the PoD and an awareness of these effects is important when quoting PoD results.

• NDT methods, equipment ‘calibration’, fluorescent penetrant developers, material, surface condition, flaws and human factors are all important operational and physical parameters, which can have a significant effect on PoD results.

• Whilst human factors are important variables in NDT procedures, they are often found not to be as important as other operational and physical variables.

• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the JCL‘Probability of Inclusion’ model.


9. RECOMMENDATIONS

• Publish a signal response data book of PoD results. • Publish a more up to date data book from different PoD studies and collate them in a way which

best serves more general industrial and modelling applications. • Set up a European style project or Joint Industry Project to realise the above recommendations.

10. ACKNOWLEDGEMENTS

The author would like to acknowledge the organisations ASM and NTIAC for giving permission to reprint and re-plot various figures in this study. Special acknowledgement goes to Ward Rummel, the author of the NTIAC data book, for his advice during discussions on various PoD datasets in the data book. The author would also like to acknowledge Martin Wall (ERS Technology Ltd) for highlighting the PoD research work that has been carried out at the NNDTC over the last two decades. Lastly, the author would like to thank the HSE for funding this work and in particular to Graeme Hughes for his guidance and useful discussion throughout this study. 11. REFERENCES

1. Rummel W D: ‘Recommended practice for a demonstration of non-destructive evaluation (NDE) reliability on aircraft production parts’. Materials Evaluation Vol. 40 August 1982.

2. Lewis W H, Sproat W H, Dodd B D and Hamilton J M: ‘Reliability of non-destructive inspection – Final Report’. SA-ALC/MME 76-6-38-1, San Antonio Air Logistics Centre, Kelly Air Force Base, Texas, 1978.

3. Rummel W D: ‘Probability of detection as a quantitative measure of non-destructive testing end-to-End process capabilities’, 1998. www.asnt.org/publications/materialseval/basics/Jan98 .

4. AGARD Lecture Series 190: ‘A recommended methodology to quantify NDE/NDI based on aircraft engine experience’, April 1993, ISBN 92-835-0707-X

5. NTIAC Non-destructive Evaluation (NDE) capabilities data book, 3rd ed., November 1997, NTIAC DB-97-02, Non-destructive Testing Information Analysis Centre.

26

6. Berens A P: ‘NDE reliability data analysis’, non-destructive evaluation and quality control: qualitative non-destructive evaluation'. ASM Metals Data book, Volume 17, Fifth printing, December 1997, ISBN 0-87170-007-7 (v.1).

7. Berens A P and Hovey P W: ‘Evaluation of NDE reliability characterisation’, AFWAL-TR-81-4160, Vol. 1, Air Force Wright-Aeronautical Laboratories, Wright-Patterson Air Force Base, December 1981.

8. Sturges D J: ‘Approaches to measuring probability of detection for subsurface flaws’, Proc. 3rd Ann. Res. Symp., ASNT 1994 Spring Conference, New Orleans, 1994, pp229-231.

9. Crawshaw J and Chambers J: ‘A concise course in A-level statistics’, 1984, Stanley Thornes (Publishers) Ltd. ISBN 0-7487-0455-8

10. Kreyszig E: ‘Advanced engineering mathematics’, John Wiley & Sons, Inc. 1983 (5th Edition, p947).

11. Matzknin G A and Yolken HT: ‘Probability of detection (PoD) for non-destructive evaluation (NDE)’, NTIAC-TA-00-01, August 2001.

12. Thompson R Bruce: ‘Overview of the ETC PoD methodology’, Review of Progress in Quantitative Non-destructive Evaluation, Vol. 18b, Plenum Press, New York, July 19-24, 1998, pp2295-2304.

13. Wall M: ‘Modelling of NDT reliability and applying corrections for human factors’, European American Workshop, Determination of Reliability and Validation Methods of NDE, Berlin, June 18-20, 1997, pp87-98.

14. Voker A W F, Dijkstra F H, Terpstra S, Herrings H A M and Lont M A: ‘Modelling of NDE reliability: Development of a PoD-Generator’, Proceedings of the 16th WCNDT, Montreal, Canada, August 30-September 3, 2004.

15. Burch S F, Stow B A and Wall M: ‘Computer modelling for the prediction of probability of detection of ultrasonic corrosion mapping’, Insight, Vol. 47 No 12 Dec 2005. (This can also be downloaded from the BINDT website www.bindt.org).

16. Wall M and Burch S: ‘Worth of modelling for assessing the intrinsic capability of NDT’, 15th World Conference on NDT, WCNDT15 Rome, October, 2000. (This can also be downloaded from the following website www.ndt.net/article/wcndt00/papers/idn735/idm735.htm).

17. Childs F R, Phillips D H, Liese L W and Rummel W D: ‘Quantitative assessment of the detectability of ceramic Inclusions in structural titanium castings by X-ray radiography’. Review of Progress in QNDE Vol. 18B, 1999, pp2311-2317, Editors Thompson D O and Chimenti D E.

18. NORDTEST Report: 'Guidelines for NDE reliability determination and description'. NT TECHN Report 394, 1998.

19. Kenzie B W, Mudge P J and Pisarski H G: ‘A methodology for dealing with uncertainties in NDE data when using inputs to fracture mechanics analyses’, Proc. 13th International Conference on NDE in the nuclear and pressure vessel industries, Kyoto, ASM International, 1995.

20. Commission of the European Communities, PISC-I, Report EUR 6371 EN Volumes 1 to VI, Brussels, Luxembourg (1979).

21. Commission of the European Communities, PISC-II, Report Nos. 1 to 5, Joint Research Centre, Ispra Establishment, Varese, Italy (1986).

22. Dover W D and Rudlin J R: ‘Results of probability of detection trials’, Proc IOCE 92, Aberdeen, 13-16 October 1992.

23. ‘NDT of thin plates – evaluation of results’, NIL Report, NDP 93-38 Rev. 1, 1995 (In Dutch). 24. Visser W: ‘PoD/PoS curves for non-destructive examination’. HSE Offshore Technology Report

2000/018, ISBN 07176 2297 5, 2002. 25. Warder, P Lilley J and Wall M: ‘Improved integrity management of bogie frame transom welded

joints’, AEAT Engineering Solutions and John Reddyhof HSBC Rail Conference, Engineering integrity of railway systems, the Arup Campus, Solihull, 21-22 October 2003.

26. Georgiou G A: ‘The Extent of Ultrasonic non-Invasive Inspection of LPG Storage vessels’. HSE Project, JCL Report No. 2/8/99, (September 1999, Revision 1).

27

27. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG storage vessels’. HSE Project, JCL Report No. 3/3/00 (June 2000).

28. Georgiou G A: ‘Proposed Guidelines for estimating the extent of manual ultrasonic NDT for LPG storage vessels’. HSE Project, JCL Report No. 4/3/00 (July 2000)

29. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG welds’. Proceedings of BINDT, September 2000.

30. AEA Technology Report, AEAT-4389 HOIS (98) P8 Issue 2 (DRAFT). Data for POD curve supplied with kind permission by Dr. Martin Wall, AEA Technology.

31. Schneider C R A and Georgiou G A: ‘Radiography of thin section welds, Part 2: Modelling’, Insight, Vol. 45, No. 2, pp 119-121, February 2003.

32. Schneider C R A and Rudlin J R: ‘Review of statistical methods used in quantifying NDT reliability’, Insight, Vol. 46, No. 2, pp 77-79, February 2004.

12. VERIFICATION STATEMENT

The following persons were involved in verification work relating to this report as outlined below. Print name …………GEORGE A GEORGIOU (GAG) …………… Position ……………Director of Jacobi Consulting Ltd……………. Qualifications………BSc PhD (C.Eng., FIMA, FInstNDT)………... Address: 57 Ockendon Road London N1 3NL Signature …………………………………………………………… Print name …………MELODY DREWRY (MD)………………….. Position ……………Research Scientist (Jacobi Consulting Ltd)…… Qualifications………BSc MSc (GInstNDT)………………………… Address: Flat 1, 54 Penywern Road London SW5 9SX Signature ……………………………………………………………. Print name …………EMILIE BEYE (EB)………………………… Position ……………Model Analyst (Abbey)………………………. Qualifications………BSc MSc (Statistics)………………………….. Address: 58 Orchard Way Bicester Oxfordshire OX26 2EJ Signature …………………………………………………. GAG carried out independent checks on the calculations used to produce Figures 5 to 11 in Appendix C, as well as a check on the data used in the calculations (c.f. Appendix II of Appendix C). The calculations were all found to be correct. MD carried out random independent checks on the datasets used to produce Figure 2 and Figures 7 to 13 (i.e. from the NTIAC data book). MD also carried out checks on the depth data analysis in Appendix C. All the Figures 9-11 were found to be a correct representation of the nominated datasets. The depths used in Appendix C were the ones listed in Appendix II of Appendix C. EB read the whole report and carried out independent checks on all the mathematical equations and derivations. In addition, EB found the statistical definitions and terminologies used in the report, and in particular those in Appendix A, to be correct and appropriate for engineers and scientists who may not have a statistical background. EB has found that conclusions and recommendations of the report to be based on sound scientific reasoning and to follow on logically from the evidence presented.

28

Table 1 Maximum Probability Tables (Based on 90% PoD and 95% Lower Confidence Limit)

Number of trials /flaw width interval

Number of successes /flaw width interval

Maximum number of trials needed for certification

Maximum probability of achieving certification (%)

29 29 29 100.0 29 28 46 96.6 29 27 61 68.0 29 26 75 25.8 29 25 89 4.9 29 24 103 0.5

(The results are reprinted with permission of NTIAC. All rights reserved. (See references (1) and (5))

29

Figure 1 Example of detection percentages for a handheld Eddy-Current inspection and a

‘log-odds’ distribution fit to the data. (Reprinted with permission of ASM International® . All rights reserved. (See references 2 and 6))

30

0

0.2

0.4

0.6

0.81

1.2

0.0

00.5

01.0

01.5

02.0

02.5

03.0

03.5

04.0

04.5

05.0

0

Fla

w d

ep

th (m

m)

PoD (Hit/miss data)

ala

rgest

is the larg

est flaw

mis

sed

a

sm

alle

st

is the s

mallest flaw

dete

cte

d

Data

Set: D

1001A

D (F

ile D

-UT

1)

Specim

en : A

lum

inum

/ F

lat P

anel

Thic

kness: 1.5

mm

and 5

.6m

m

Conditio

n: A

s M

achin

ed

ND

T M

eth

od: U

ltra

sonic

surface w

aves

Tota

l num

ber of flaw

s =

311

Tota

l flaw

s d

ete

cte

d =

258

Tota

l flaw

s m

issed =

53

flaw

s m

issed

flaw

s d

ete

cte

d

F

igur

e 2

Ult

raso

nic

ND

T h

it/m

iss

data

illu

stra

ting

the

rel

ativ

ely

larg

e ga

p be

twee

n th

e sm

alle

st f

law

det

ecte

d an

d th

e la

rges

t fl

aw m

isse

d (T

he r

esul

ts a

re r

e-pl

otte

d w

ith p

erm

issi

on o

f N

TIA

C. A

ll ri

ghts

res

erve

d. (

See

refe

renc

e (5

))

31

Figure 3 The linear relationship between the log-odds and log flaw size (Reprinted with permission of ASM International® . All rights reserved. (See reference 6))

32

Fla

w d

imen

sio

n

Probability of detection PoD(a)

Po

D

a

Pro

bab

ilit

y d

ensi

ty f

un

ctio

n

f(a

)

of

def

ect

wit

h f

ixed

dim

ensi

on

a

Pro

bab

ilit

y o

f D

etec

tio

n

fun

ctio

n

(Po

D (

a))

is

the

mea

n o

f th

e p

rob

abil

ity

den

sity

fu

nct

ion

f(a

)

Fig

ure

4 Sc

hem

atic

of

the

PoD

for

fla

ws

of f

ixed

dim

ensi

on f

or ‘

hit/

mis

s’ d

ata

33

ln (

Fla

w d

imen

sio

n)

ln (Signal Response)

a1

Pro

bab

ilit

y d

ensi

ty

fun

ctio

n

f(a

) o

f fl

aw

wid

th f

ixed

dim

ensi

on

a

Pro

bab

ilit

y o

f D

etec

tio

n f

un

ctio

n

(Po

D (

a))

is

th

e ar

ea b

etw

een

th

e

pro

bab

ilit

y d

ensi

ty f

un

ctio

n

f(a

) an

d

the

eval

uat

ion

th

resh

old

a2

Th

e ev

alu

atio

n t

hre

sho

ld

(a

th)

F

igur

e 5

Sche

mat

ic o

f th

e P

oD f

or f

law

s of

fix

ed d

imen

sion

for

‘si

gnal

res

pons

e’ d

ata

34

Figure 6 A comparison between the log-odds and cumulative log-normal distribution functions for the same parameters � =0 and � =1.0

(Reprinted with permission of ASM International® . All rights reserved. (See reference (6))

35

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck L

en

gth

in

In

ch

es

Probability of detection (PoD) in %

Log-O

dds M

odel

95%

Confid

ence L

ow

er Lim

itH

it/M

iss D

ata

flaw

s m

issed

flaw

s d

ete

cte

d

Data

Set: D

9001(3

)L (F

ile D

-UT

9)

Specim

en: 2219 A

lum

inum

,GT

A

W

eld

ed, P

anels

with

L

ack o

f P

enetratio

n

Condition: A

s W

eld

ed a

nd S

carfed

ND

T M

eth

od: U

ltrasonic

, S

hear W

ave

No. of opera

tors

: 3 O

pera

tors

com

bin

ed

Tota

l num

ber of fla

ws =

499

Tota

l fla

ws d

ete

cte

d =

105

Tota

l fla

ws m

issed =

394

90%

PoD

N

ot A

chie

ved

The L

og-o

dds m

odel is n

ot

applic

able

F

igur

e 7

An

exam

ple

of w

hen

the

log-

odds

mod

el w

as n

ot a

pplic

able

to

the

data

col

lect

ed

(The

res

ults

are

re-

plot

ted

with

per

mis

sion

of

NT

IAC

. All

righ

ts r

eser

ved.

(Se

e re

fere

nce

(5))

36

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck L

en

gth

in

In

ch

es

Probability of Detection (PoD) in %

Eddy

Currents

X-R

ays

Ultr

asonic

Im

mers

ion

Da

ta S

et:

ET

AC

00

3L

-C (

File

A-E

T3

)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

L

ow

Cyc

le F

atig

ue

Cra

cks

Co

nd

itio

n:

A

fte

r E

tch

an

d P

roo

f L

oa

d

ND

T M

eth

od

: E

dd

y C

urr

en

t -

Ha

nd

Sca

n

Op

era

tor:

C

To

tal n

um

be

r o

f fla

ws

= 1

35

To

tal f

law

s d

ete

cte

d

=

6

9

To

tal f

law

s m

isse

d

=

66

90

% P

OD

=

0.5

81

in.

(1

4.7

6 m

m)

Da

ta S

et:

D

30

03

CL

(F

ile D

-UT

3)

Te

st O

bje

ct :

6A

L-4

V T

itan

ium

/ F

lat

Pla

te

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

Aft

er

Etc

h a

nd

Pro

of

Lo

ad

ND

T M

eth

od

: U

T I

mm

ers

ion

-Sh

ea

r W

ave

Op

era

tor:

C

To

tal n

um

be

r o

f fla

ws

= 1

35

To

tal f

law

s d

ete

cte

d

=

11

6

To

tal f

law

s m

isse

d

=

19

90

% P

OD

= 0

.13

3 in

. (

3.3

8 m

m)

Da

ta S

et:

F3

06

53

CL

(F

ile F

-XT

3)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

Aft

er

Etc

h a

nd

Pro

of

Lo

ad

ND

T M

eth

od

: X

-ra

dio

gra

ph

y

Op

era

tor:

C

To

tal n

um

be

r o

f fla

ws

= 6

1

To

tal f

law

s d

ete

cte

d

=

41

To

tal f

law

s m

isse

d

= 2

0

90

% P

OD

= 0

.72

9 in

. (1

8.5

2 m

m)

F

igur

e 8

PoD

(a)

log-

odds

mod

el r

esul

ts f

or d

iffe

rent

ND

T m

etho

ds a

pplie

d to

the

sam

e fl

aw s

peci

men

(T

he r

esul

ts a

re r

e-pl

otte

d w

ith p

erm

issi

on o

f N

TIA

C. A

ll ri

ghts

res

erve

d. (

See

refe

renc

e (5

))

37

0

10

20

30

40

50

60

70

80

90

100

00.0

20.0

40.0

60.0

80.1

0.1

20.1

40.1

60.1

80.2

Cra

ck D

ep

th in

In

ch

es

No D

evelo

per

Non-A

queous D

evelo

per

Da

ta S

et:

CE

03

1(6

)D (

File

C-P

TE

)

Te

st O

bje

ct :

L

ow

cyc

le f

atig

ue

cra

cks

in

Ha

yne

s 1

88

, F

lat

Pa

ne

ls

Co

nd

itio

n:

Etc

he

d

ND

T M

eth

od

: W

ate

r W

ash

ab

le,

Flu

ore

sce

nt

Pe

ne

tra

nt,

No

De

velo

pe

r

Op

era

tor:

C,

Fa

cilit

y 1

To

tal n

um

be

r o

f fla

ws

= 2

84

To

tal f

law

s d

ete

cte

d

=

9

3

To

tal f

law

s m

isse

d

=

19

1

90

% P

OD

= N

ot

Ach

ieve

d

Da

ta S

et:

C

E0

32

(6)D

(F

ile C

-PT

E)

Te

st O

bje

ct :

L

ow

cyc

le f

atig

ue

cra

cks

in

Ha

yne

s 1

88

, F

lat

Pa

ne

ls

Co

nd

itio

n:

Etc

he

d

ND

T M

eth

od

: W

ate

r W

ash

ab

le,

Flu

ore

sce

nt

Pe

ne

tra

nt,

No

n A

qu

eo

us

De

velo

pe

r

Op

era

tor:

C

, F

aci

lity

1

To

tal n

um

be

r o

f fla

ws

= 2

84

To

tal f

law

s d

ete

cte

d

=

24

9

To

tal f

law

s m

isse

d

=

35

90

% P

OD

= 0

.02

4 in

. (0

.59

8 m

m)

F

igur

e 9

PoD

(a)

log-

odds

mod

el r

esul

ts f

or f

luor

esce

nt p

enet

rant

: no

dev

elop

er a

nd d

evel

oper

app

lied

to t

he s

ame

flaw

spe

cim

en

(The

res

ults

are

re-

plot

ted

with

per

mis

sion

of

NT

IAC

. All

righ

ts r

eser

ved.

(Se

e re

fere

nce

(5))

38

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck L

en

gth

In

In

ch

es

Probability of Detection (PoD) In %

6A

L-4

V T

itaniu

m4340 S

teel

Alu

min

ium

Da

ta S

et:

ET

A1

00

1A

(F

ile A

-ET

1)

Te

st O

bje

ct :

A

lum

inu

m /

Fla

t P

an

el

Lo

w C

ycle

fa

tigu

e C

rack

s

Co

nd

itio

n:

As

Ma

chin

ed

ND

T M

eth

od

: E

dd

y C

urr

en

t-H

an

d S

can

Op

era

tor:

A

To

tal n

um

be

r o

f fla

ws

= 3

11

To

tal f

law

s d

ete

cte

d

=

20

8

To

tal f

law

s m

isse

d

= 1

03

90

% P

OD

=

0.1

96

in.

(4.9

8m

m)

Da

ta S

et:

ET

A3

00

1A

(F

ile A

-ET

3)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

As

Ma

chin

ed

ND

T M

eth

od

: E

dd

y C

urr

en

t -

Ha

nd

Sca

n

Op

era

tor:

A

To

tal n

um

be

r o

f fla

ws

= 1

34

To

tal f

law

s d

ete

cte

d

=

92

To

tal f

law

s m

isse

d

=

4

2

90

% P

OD

=

0.1

73

in.

(4

.40

mm

)

Da

ta S

et:

A7

00

1A

L (

File

A-E

T7

)

Te

st O

bje

ct :

4

34

0 S

tee

l / F

lat

Pla

te

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

As

Ma

chin

ed

ND

T M

eth

od

: E

dd

y C

urr

en

t -

Ha

nd

Sca

n

Op

era

tor:

A

To

tal n

um

be

r o

f fla

ws

= 1

42

To

tal f

law

s d

ete

cte

d

=

30

To

tal f

law

s m

isse

d

=

11

2

90

% P

OD

= N

ot

Ach

ieve

d

F

igur

e 10

PoD

(a)

log-

odds

mod

el r

esul

ts f

or m

anua

l edd

y cu

rren

ts:

diff

eren

t m

ater

ials

but

nom

inal

ly t

he s

ame

flaw

s (T

he r

esul

ts a

re r

e-pl

otte

d w

ith p

erm

issi

on o

f N

TIA

C. A

ll ri

ghts

res

erve

d. (

See

refe

renc

e (5

))

39

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck L

en

gth

In

In

ch

es

Probability Of Detection (PoD) In %

Weld

s w

ith C

row

ns

Weld

s G

round F

lush

Data

Set: F

6003(3

)L (F

ile F

-XT

6)

Test O

bje

ct : L

ongitu

din

al C

racks in

2

219 A

lum

inum

,GT

A

W

eld

s w

ith C

row

ns

Condition: A

s C

racked, E

tched a

nd

P

roof Loaded

ND

T M

eth

od: X

ray R

adio

gra

phy

Opera

tor: C

om

bin

ed, 3 O

pera

tors

Tota

l num

ber of fla

ws =

162

Tota

l fla

ws d

ete

cte

d = 8

0

Tota

l fla

ws m

issed = 8

2

90%

PO

D =

N

ot A

chie

ved

Data

Set: F

8003(3

)L (F

ile F

-XT

8)

Test O

bje

ct : L

ongitu

din

al C

racks in

2

219 A

lum

inum

,GT

A,

F

lush G

round W

eld

s

Condition: A

s C

racked, E

tched a

nd

P

roof Loaded

ND

T M

eth

od: X

ray R

adio

gra

phic

Opera

tor: C

om

bin

ed, 3 O

pera

tors

Tota

l num

ber of fla

ws =

324

Tota

l fla

ws d

ete

cte

d = 1

85

Tota

l fla

ws m

issed = 1

39

90%

PO

D =

N

ot A

chie

ved

F

igur

e 11

PoD

(a)

log-

odds

mod

el r

esul

ts f

or X

-ray

rad

iogr

aphy

: di

ffer

ent

wel

d co

ndit

ions

but

nom

inal

ly t

he s

ame

flaw

s (T

he r

esul

ts a

re r

e-pl

otte

d w

ith p

erm

issi

on o

f N

TIA

C. A

ll ri

ghts

res

erve

d. (

See

refe

renc

e (5

))

40

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck len

gth

In

In

ch

es

Probability of detection (PoD) In %

Tra

nsvers

e F

law

sLongitu

din

al F

law

s

Data

Set: C

D002(3

)L (F

iile C

-PT

D)

Test O

bje

ct : T

ransvers

e C

racks in

2

219 A

lum

inum

,GT

A,

F

lush G

round W

eld

s

Condition: A

s C

racked a

nd E

tched

ND

T M

eth

od: F

luore

scent P

enetrant

Opera

tor: C

om

bin

ed 3

Opera

tors

Tota

l num

ber of fla

ws =

54

Tota

l fla

ws d

ete

cte

d = 4

8

Tota

l fla

ws m

issed = 6

90%

PO

D =

0.2

01 in

. (5

.11 m

m)

Data

Set: C

C002(3

)L (F

ile C

-PT

C)

Test O

bje

ct : L

ongitu

din

al C

racks in

2

219 A

lum

inum

,GT

A,

F

lush G

round W

eld

s

Condition: A

s C

racked a

nd E

tched

ND

T M

eth

od: F

luore

scent P

enetrant

Opera

tor: C

om

bin

ed 3

Opera

tors

Tota

l num

ber of F

law

s =

324

Tota

l fla

ws d

ete

cte

d = 3

07

Tota

l fla

ws m

issed = 1

7

90%

PO

D =

0.0

48 in

. (1

.21 m

m)

F

igur

e 12

PoD

(a)

log-

odds

mod

el r

esul

ts f

or f

luor

esce

nt p

enet

rant

: di

ffer

ent

flaw

s bu

t no

min

ally

the

sam

e sp

ecim

ens

(The

res

ults

are

re-

plot

ted

with

per

mis

sion

of

NT

IAC

. All

righ

ts r

eser

ved.

(Se

e re

fere

nce

(5))

41

0

10

20

30

40

50

60

70

80

90

100

00.1

0.2

0.3

0.4

0.5

0.6

0.7

Cra

ck L

en

gth

In

In

ch

es

Probability of Detection (PoD) In %

Da

ta S

et:

D3

00

3A

L (

File

-D-U

T3

)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

Aft

er

Etc

h a

nd

Pro

of

Lo

ad

ND

T M

eth

od

: U

T I

mm

ers

ion

-Sh

ea

r W

ave

Op

era

tor:

A

To

tal n

um

be

r o

f fla

ws

= 1

35

To

tal f

law

s d

ete

cte

d

=

10

5

To

tal f

law

s m

isse

d

=

30

90

% P

OD

= 0

.26

5 in

. (

6.7

3 m

m)

Da

ta S

et:

D3

00

3B

L (

File

D-U

T3

)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

Aft

er

Etc

h a

nd

Pro

of

Lo

ad

ND

T M

eth

od

: U

T I

mm

ers

ion

-Sh

ea

r W

ave

Op

era

tor:

B

To

tal n

um

be

r o

f fla

ws

= 1

35

To

tal f

law

s d

ete

cte

d

=

11

5

To

tal f

law

s m

isse

d

=

2

0

90

% P

OD

=

0.1

11

in.

(2

.82

mm

)

Da

ta S

et:

D3

00

3A

L (

File

D-U

T3

)

Te

st O

bje

ct :

6

AL

-4V

Tita

niu

m /

Fla

t P

late

Lo

w C

ycle

Fa

tigu

e C

rack

s

Co

nd

itio

n:

Aft

er

Etc

h a

nd

Pro

of

Lo

ad

ND

T M

eth

od

: U

T I

mm

ers

ion

-Sh

ea

r W

ave

Op

era

tor:

C

To

tal n

um

be

r o

f fla

ws

= 1

35

To

tal f

law

s d

ete

cte

d

=

1

16

To

tal f

law

s m

isse

d

=

1

9

90

% P

OD

=

0.1

33

in.

(3

.38

mm

)

Opera

tor A

Opera

tor B

Opera

tor C

F

igur

e 13

PoD

(a)

log-

odds

mod

el r

esul

ts f

or U

ltra

soun

d (I

mm

ersi

on):

dif

fere

nt o

pera

tors

but

insp

ecti

ng t

he s

ame

flaw

spe

cim

en

(The

res

ults

are

re-

plot

ted

with

per

mis

sion

of

NT

IAC

. All

righ

ts r

eser

ved.

(Se

e re

fere

nce

(5))

42

APPENDIX A

GLOSSARY OF TERMS, STATISTICAL TERMINOLOGY AND OTHER RELEVANT INFORMATION

TABLE OF CONTENTS

1. GLOSSARY OF TERMS A1

2. STATISTICS TERMINOLOGY A1

2.1. CONFIDENCE INTERVAL A1

2.2. CONFIDENCE LIMITS A1

2.3. CONFIDENCE LEVEL: A1

2.4. MAXIMUM LIKELIHOOD METHODS: A2

2.5. PROBABILITY OF DETECTION: A4

3. OTHER RELEVANT INFORMATION A4

3.1. CALCULATING A CONFIDENCE INTERVAL A4

3.2. LOG-ODDS MODEL A5

3.3. LOG-NORMAL MODEL A6

3.4. PROBABILITY DENSITY FUNCTION A7 3.4.1. The Discrete Case A7 3.4.2. The Continuous Case A8

A1

1. GLOSSARY OF TERMS

English Symbols

Description

a Flaw size â Signal response

âth Signal response threshold C Confidence level F Continuous cumulative distribution m The median of a population p A general statistical parameter of a population p1 The lower confidence limit p2 The upper confidence limit

PoD (a) The probability of detection function

Greek Symbols

� 1 A parameter in the log-normal relationship � 1 A parameter in the log-normal relationship � An error term in the log-normal relationship � The mean of a population � The standard deviation of a population

� � The standard deviation of the error term �

2. STATISTICS TERMINOLOGY

2.1. CONFIDENCE INTERVAL

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter. (Note: Each estimated range is calculated from a particular random sample. If random samples of the same size are taken repeatedly from the same population and a confidence interval is calculated for each sample, then a certain percentage of the intervals will include the unknown parameter. This percentage is referred to as the ‘confidence level’ (see below). The width of the confidence interval provides some idea about the uncertainty of the unknown parameter. A very wide interval may indicate that more data should be collected before anything very definitive can be said about the parameter). 2.2. CONFIDENCE LIMITS

Confidence limits are the lower and upper boundaries of a confidence interval. 2.3. CONFIDENCE LEVEL:

The confidence level is the probability value that is associated with the confidence interval.

A2

(Note: It is called a probability value, notwithstanding the fact that it should not be interpreted as a probability. The notation of probability is introduced when calculating the confidence interval using a normal probability distribution curve, where different areas under the curve have corresponding probabilities (see the example in Section 3). 2.4. MAXIMUM LIKELIHOOD METHODS:

Statement of the Maximum Likelihood Method (MLM): • We have made n measurements of x {x1, x2, …, xn}. • We know the probability density function that describes x: f(x, a). • We want to determine the parameter a, hence we select a to maximise the probability of

getting the measurements of the xi's The mathematical implementation of the MLM, for calculating the parameters in the log-odds or log-normal models, is outside the scope of this study. However, particular worked examples can be found in reference 6. the example below illustrates an optimisation technique called ‘the method of least squares’, which can be regarded as a special case of the MLM. Example: A trolley moves along a track at constant speed. Suppose the following measurements of the distance vs. time were made. From the data find the best value for the speed (v) of the trolley.

Distance d (mm) 11 19 33 40 49 61

Time t (seconds) 1.0 2.0 3.0 4.0 5.0 6.0

Since the trolley is moving at constant speed, the gradient in a distance vs. time graph must be constant and it can be assumed that the relationship between d and v must take the form:

0

d vt d= + (1)

The problem is to establish the parameters d0 and v from the measurements taken. Here, it is going to be done using the ‘method of least squares’ (9). The least squares regression line d on v is given by

( )2

td

t

sd d t t

s! = ! (2)

where d and t are the mean values of the distance and time respectively and

2

6 6 6

i i i i

td 1 1 1

26 6

t 2

i i

1 1

n t d t ds

s

n t t

!

=" #

! $ %& '

( ( (

( ( (3)

A3

on comparing equations (1) and (2) it is straight forward to show that

2

td

t

sv

s= (4)

and

2

td

0

t

sd d t

s

! "= # $ %$ %

& ' (5)

Since

i

6 6 6 62

i i i i

1 1 1 1

t 21, d 213, t d 919, t 91= = = =! ! ! ! (6)

Using the values in equation (6) and substituting the appropriate ones into equations (3), (4) and (5), it can be shown that

0

v 9.9

d 0.8

d 9.9t 0.8

=

=

! = +

(7)

d = 9.9t + 0.8

R2

= 1.0

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7

time t (seconds)

dis

tan

ce

d (

mm

)

• The trend line is the optimum fit to the data • It minimises the sum of the squares of the deviations between the line and the data

(hence the name ‘method least squares’)

A4

• This method is also used in Microsoft Excel to establish the trend line. In Excel a ‘correlation coefficient’ (R2) is also calculated and provides a quantitative measure of fit. The closer R2 is to 1 the better the fit. For the data here, R2 has been calculated and to one decimal place the trend line is virtually a perfect fit.

The MLM will also involve deriving formulae for the relevant parameters (i.e. usually the ‘average’ and the standard deviation). However, the formulae are much more complicated than the least squares formulae presented here. 2.5. PROBABILITY OF DETECTION:

The PoD (a) function is the proportion of all flaws of size ‘a’ that will be detected in a particular application of an NDT system. 3. OTHER RELEVANT INFORMATION

3.1. CALCULATING A CONFIDENCE INTERVAL

See also reference (9), p434. Consider the confidence Interval for the mean � of a population where the population variance � 2 is known. If X is normally distributed such that X ~ N(� ,� 2), then for any n (i.e. for any random sample with n pieces of data)

2

X ~ N ,n

!µ" #$ %& '

Standardising, we have X

Z/ n

µ

!

"= where Z ~ N(0,1 ) .

We know that the central 95% area of N(0, 1) lies between the values ± 1.96

XP( 1.96 1.96 ) 0.95

/ n

P( 1.96 X 1.96 ) 0.95n n

P( 1.96 X 1.96 ) 0.95n n

P( X 1.96 X 1.96 ) 0.95n n

µ

!

! !µ

! !µ

! !µ

"# " $ $ =

# " $ " $ =

# % " % " =

# + % % " =

-1.96 -1.96

2.5% N(0,1)

0

A5

P( X 1.96 X 1.96 ) 0.95n n

! !µ" # $ $ + = (8)

Equation (8) has a probability of 0.95 associated with the interval. However, it is stressed that equation (8) should not interpreted as ‘the probability that � lies between X ± 1.96 � /√n is 0.95’, since the value of the population mean � will either be in the particular interval or not. The correct interpretation of equation (8) is that if a large number of different confidence intervals are calculated in the same way (the intervals will be different because X will be different for each random sample), then we would expect that about 95% of them will include or ‘trap’ � . That is, an interval has been found with a 95% confidence level that includes � .

3.2. LOG-ODDS MODEL

Consider the two forms of the log-odds model:

a m

3

a m

3

ePoD a

1 e

!

"

!

"

#$ %& '( )

#$ %& '( )

=

+

ln

ln( ) (9)

and

( )

( )

a

a

ePoD a

1 e

! "

! "

+

+=

+

ln

ln( ) (10)

on comparing equations (9) and (10)

lna mlna

3

m lnalna

3 3

!" #

$

! !" #

$ $

%& '= +( )

* +

,% + = +

on comparing coefficients

and m3

hence m and3

!" # "

$

# !$

" "

= = %

= % =

Let p = PoD(a) in equation (10), hence

A6

lna lna

lna

lna

p( 1 e ) e

p e ( 1 p )

pe

1 p

pln lna

1 p

ln( odds ) lna

! " ! "

! "

! "

! "

+ +

+

+

+ =

# = $

# =$

% &# = +' (

$) *

# +

3.3. LOG-NORMAL MODEL

Since

1 1ln â ln(a)( ) ! " #= + +

then we say ln(â) is normally distributed with mean � (a) = � 1 + � 1 ln(a) and constant standard deviation � � (i.e. N( � (a), � �

2). The standard parameter Z ~ N(0,1 ) is given by

1 1ˆlna ( lna )

Z!

" #

$

% += (11)

and the area we require is given the right hand shaded portion.

Since 2

thZ t

2th th

1P( Z Z ) F( Z ) e dt

2!

"

"#

$ = = %

(see section 3.4.2 below)

th thP( Z Z ) 1 F( Z )! > = "

and hence from equation (11)

th 1 1

th

1 1 th

th 1

1

1

ˆln( a ) ( lna )PoD( a ) 1 F

but PoD( a ) F( Z )

ˆ( lna ) ln( a )PoD( a ) F

ˆ(ln( a ) )lna

PoD( a ) F/

!

!

!

" #

$

" #

$

"

#

$ #

% &' += ' ( )

( )* +

= '

% &+ ', = ( )

( )* +

% &- .''( )/ 01 2( ), =

( )( )( )* +

-Zth Zth

N(0,1)

0

A7

which is the cumulative log-normal distribution with mean � and standard deviation � given by:

th 1

1

1

ˆ(ln( a ) )( a )

!

"µ

#

$$

#

%=

=

3.4. PROBABILITY DENSITY FUNCTION

3.4.1. The Discrete Case

A discrete random variable X is defined here in terms of its properties: (a) X is an event which is associated with the discrete values (x1, x2,……., xk) (b) The probabilities associated with each of the values (x1, x2,……., xk) are

(p1, p2,……, pk) respectively (i.e. P(X=xi) = pi, where 1 ≤ i ≤ k) For example, in the rolling of an unbiased die, X could be the event 'the score of the rolled die'. The set of associated discrete values are (1, 2, 3, 4, 5, 6), each with probability p = 1/6. The set of all the possible values of the event X and associated probabilities describe the 'probability distribution' of X. The above notation can be simplified by introducing the discrete probability density function which is commonly denoted by:

p(x)or P(X=x) (p(x) 0)! (12)

where x is a general element of the range of possible values of X, defined as the ‘Quantile. The ordinates of p(x) represent the probability that X assumes a particular value x (the random variable is normally denoted by a capital letter and the particular value it takes by a small letter). From probability theory it follows that:

all x

p( x ) 1=! (13)

that is, the probability that X assumes any one of all the possible values is a certainty. The cumulative distribution function F(x) is defined as:

t x

F( x ) p( t ) P( X x )!

= = !" (14)

that is, the probability that X assumes any one of the values up to and including the value x.

A8

3.4.2. The Continuous Case

A continuous random variable X can assume any value in a particular interval rather than any value from a set of discrete values. It is necessary to define a continuous function to describe the probability distribution of X. This function is called the continuous probability density function f(x) and it is usual to define it over the range -∞ <x <∞. The random variable X is defined in terms of f(x) and has the following properties:

P( X ) f ( x ) dx 1 ( f ( x ) 0 )

!

"!

"! < < ! = = #$ (15)

that is, the probability that X lies in the complete range of possible values is a certainty and corresponds to the whole area under f(x). Equation (15) is analogous to equation (13)) and it follows that

b

a

P( a X b ) f ( x ) dx< < = ! (16)

that is, the probability that X lies in the interval (a, b) is the corresponding area under f(x). From equation (16), the probability that the continuous random variable X lies between x and x+δx (where δx is a small finite interval) is f(x) δx (i.e. the corresponding area under f(x)). With an appropriate selection of δx, this area under f(x) can be used to approximate the probability that X assumes a particular value in a discrete distribution, that is:

f ( x ). x P( X x )! = = (17)

The ordinates of f(x) represent the probability per unit length and hence the terminology 'probability density function' for f(x) is highly appropriate. The cumulative distribution function F(x) is given by:

x

F( x ) f ( t ) dt P( X x ) ( x )!"

= = # !" < < "$ (18)

Equation (18) is analogous to equation (14) and represents the area under the curve f(x) up to and including x.

APPENDIX B

AN AUDIT TOOL FOR THE PRODUCTION

AND APPLICATION OF POD CURVES

TABLE OF CONTENTS

1. INTRODUCTION B1

2. OPERATIONAL PARAMETERS B1

2.1. NDT METHODS B1

3. PHYSICAL PARAMETERS B2

3.1. THE SPECIMEN B2

3.2. FLAW CHARACTERISTICS B2

4. MODELLING OF POD(A) B2

B1

1. INTRODUCTION

The ‘audit tool’ (or check list) is based on issues that are covered in the main text, in order to check that the PoD has been gathered correctly and that there is enough data to compute the PoD/Confidence limit combination and the parameters associated with the PoD (a) function. The following suggested audit tool for HSE inspectors is aimed at assisting them in their involvement with general PoD studies and in particular safety cases involving PoD. For each item in the audit tool there is a reference to a particular section in the main PoD report for further information and clarification. The order of the questions is currently as they are found in the PoD report and not by any order of priority or importance. The headings below cover some important operational and physical parameters that are known to affect PoD results. The tables contain the kind of questions that could be asked by HSE inspectors and organisations producing and or applying PoD curves. It is worthwhile looking at the ‘Guidelines’ document (Appendix D), which provides some examples of how one might apply an existing PoD curve to the inspection of welds or components with flaws. This document should be regarded as a working document, which could be updated periodically (e.g. every 3-4 years), based on developments in PoD research and experiences of HSE inspectors.

2. OPERATIONAL PARAMETERS

2.1. NDT METHODS

Audit Section reference in

PoD report

What was recorded for the NDT parameters? 5.2

How often was the equipment calibrated? 5.2

How often was the NDT procedure assessed to see that it was producing the correct result?

5.2

What NDT method was used? 5.2.1 (Figure 8)

Was a developer used in fluorescent penetrant testing? 5.2.2 (Figure 9)

Were the correct procedures followed before each NDT method was applied (e.g. thorough cleaning after a fluorescent penetrant test?)

5.2.2

What sort of operational variability was there in the PoD results of different operators?

5.2.6 (Figure 13)

How many inspectors in the PoD study? 5.2.6 (Figure 13)

B2

3. PHYSICAL PARAMETERS

3.1. THE SPECIMEN

Audit Section reference

in PoD report

What is the material? 5.2.3 (Figure 10) What are the physical dimensions? 5.2.3 What is the surface condition? 5.2.3 (Figure 10) What is the state of machining? 5.2.3 (Figure 10) What is the weld geometry? 5.2.4 What is the weld condition? 5.2.4 (Figure 11)

3.2. FLAW CHARACTERISTICS


in PoD report

What is the largest flaw that can be missed? 3.1 (Figure 2) Where flaws in PoD study simulated or real? 3.2 How many flaws in PoD study? 3.3.3 What was recorded for the flaw characteristics? 5.2.5 (Figure 12)

4. MODELLING OF POD(A)


in PoD report

Is the PoD a good enough fit, does the PoD increase with flaw size? 5.1 (Figure 7) Does the confidence limit increase with flaw size? 5.1 (Figure 7) Is there enough data in the part of the PoD that is increasing? 5.1 (Figure 2) Has ln(â) been plotted against ln(a)? 5.1 (Figure 3) Is ln(â) an increasing function of ln(a)? 5.1 (Figure 3) Are the values of � and � reasonable (e.g. either too large or too small)? 5.1

APPENDIX C

THE VALIDITY OF THE JCL ‘PROBABILITY OF INCLUSION’ MODEL

The validity, of the JCL ‘Probability of Inclusion’ model

By

George A Georgiou

Melody Drewry Emilie Beye

Jacobi Consulting Ltd

Ci

TABLE OF CONTENTS

TABLE AND FIGURE CAPTIONS Cii EXECUTIVE SUMMARY Ciii Background Ciii Objectives Ciii Work Carried Out Ciii Conclusions Ciii Recommendation Ciii

1. INTRODUCTION C1

2. OBJECTIVES C1

3. ASSESS PREVIOUS MODELLING WORK C1

3.1. A CRITICAL REVIEW C2

3.2. DISCUSSIONS WITH HSE INSPECTORS C2

4. THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL C2

4.1. ASSESSMENT OF MBEL’S STATISTICAL DEFINITIONS C2

4.2. SUMMARY OF MBEL'S MODELLING APPROACH C3 4.2.1. The Binomial Model (With Replacement) C5 4.2.2. The Hypergeometric Model (Without Replacement) C5

4.3. A SUMMARY OF JCL’S MODELLING APPROACH C6

4.4. A COMPARISON OF THE MBEL AND THE JCL APPROACHES C7

4.5. COMMENTS ON THE HYPERGEOMETRIC MODEL C8

5. FURTHER ANALYSIS OF REAL DATA FROM HSE AND OTHER SOURCES C8

5.1. DATASETS INVOLVING FLAW LENGTHS C9

5.2. DATASETS INVOLVING FLAW DEPTHS C11

5.3. REASSESS AND VALIDATE THE MODEL C12

5.4. SELECT THE BEST AVAILABLE MODEL C14

6. CONCLUSIONS C14

7. RECOMMENDATIONS C14

8. ACKNOWLEDGEMENTS C14

9. REFERENCES C14

TABLES 1 - 6 FIGURES 1 - 13 APPENDIX CI PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL

‘PROBABILITY OF INCLUSION’ MODELS PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL

APPENDIX CII THE DATASETS USED IN THE STATISTICAL ANALYSES

Cii

TABLE AND FIGURE CAPTIONS

TABLE CAPTIONS

Table 1 The mean and standard deviation for the four flaw length data sets Table 2 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each data set against each distribution Table 3 The ‘p-value’ calculated using the Shapiro-Wilks test for each data set against the Normal distribution Table 4 The mean and standard deviation for the three flaw depth data sets Table 5 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each flaw depth data set against each

distribution Table 6 The ‘p-value’ calculated using the Shapiro-Wilks test for each flaw depth data set against the Normal

distribution FIGURE CAPTIONS

Figure 1 An illustration of a Liquid Petroleum Gas (LPG) sphere Figure 2 An illustration of a collapsed LPG Sphere Figure 3 Probability of including a defective part of at least 1% given a certain % level of inspection (Hyper

geometric model) Figure 4 Probability of including a defective part of at least 1% given a certain % level of inspection (Comparison of

JCL and MBEL Models) Figure 5a A typical analysis and presentation of Case 1 (Pi data) using S-Plus Figure 5b Quantile-Quantile plots of Case 1 (Pi data) against Normal, (b) Exponential, (c) Logistic and (d)

Lognormal distributions Figure 5c Cumulative density function plots of Case 1 (Pi data) against (a) Normal, (b) Exponential, (c) Logistic and

(d) Lognormal distributions Figure 6a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus Figure 6b Quantile-Quantile plots of Case 2 (full sectioning data) against Normal, (b) Exponential, (c) Logistic and

(d) Lognormal distributions Figure 6c Cumulative density function plots of Case 2 (full sectioning data) against (a) Normal, (b) Exponential, (c)

Logistic and (d) Lognormal distributions Figure 7a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus Figure 7b Quantile-Quantile plots of Case 3 (ultrasonic data) against Normal, (b) Exponential, (c) Logistic and (d)

Lognormal distributions Figure 7c Cumulative density function plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c)

Logistic and (d) Lognormal distributions Figure 8a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus Figure 8b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic

and (d) Lognormal distributions Figure 8c Cumulative density function plots of Case 4 (reduced sectioning data) against (a) Normal, (b) Exponential,

(c) Logistic and (d) Lognormal distributions Figure 9a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus

Figure 9a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus Figure 9b Quantile-Quantile plots of Case 2 (full sectioning data) against Normal, (b) Exponential, (c) Logistic and

(d) Lognormal distributions Figure 9c Cumulative density function plots of Case 2 (full sectioning data) against (a) Normal, (b) Exponential, (c)

Logistic and (d) Lognormal distributions Figure 10a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus Figure 10b Quantile-Quantile plots of Case 3 (ultrasonic data) against Normal, (b) Exponential, (c) Logistic and (d)

Lognormal distributions Figure 10c Cumulative density function plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c)

Logistic and (d) Lognormal distributions Figure 11a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus Figure 11b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic

and (d) Lognormal distributions Figure 11c Cumulative density function plots of Case 4 (reduced sectioning data) against (a) Normal, (b) Exponential,

(c) Logistic and (d) Lognormal distributions Figure 12 Estimating the probability of including a defective part (P(X>0) = 1-P(X=0)) using the Lognormal

distribution given a certain % level of inspection for the cases (a) X=0.02, (b) X=0.004 and (c) X=0.0008 Figure 13 The probability of including a defective part (P(X>0) = 1-P(X=0)) using (a) the Normal distribution and (b)

the Log-odds distribution

Ciii

EXECUTIVE SUMMARY

Background

The extent of non-invasive inspection of LPG storage vessels was considered previously by Jacobi Consulting Ltd (JCL) and a probabilistic model (i.e. the ‘Probability of Inclusion’) based on the Normal distribution, was developed for optimising the probability of flaw detection. A companion guidelines document was written to assist companies and HSE inspectors to assess how much NDT was required in order to achieve a desired probability of detecting a flaw (i.e. the ‘Index of Detection’). The earlier work was recently assessed and reviewed by an independent statistician in August 2004, as part of a training programme arrangement with Toulouse University, France. In the meantime, some HSE inspectors have considered the earlier JCL work along with the guidelines document and it was timely to assess their views and comments. It was also timely, as part of the review, to carry out some formal statistical tests on the various distributions used in developing the ‘Probability of Inclusion’ model, with a view to selecting the most appropriate distribution. The opportunity was taken to update the companion guidelines document to Appendix C, which has now become a separate document, Appendix D. Objectives

• To provide a critical review of the ‘Probability of Inclusion’ model • To carry out formal statistical tests on the distributions used in the modelling • To select the most appropriate distribution for the ‘Probability of Inclusion’ model

Work Carried Out

The work has focussed on three important issues. The first was to review the existing ‘Probability of Inclusion’ model with a view to validating it against an independently developed model by Mitsui Babcock Engineering Ltd (MBEL). The second issue was to establish which of the statistical distributions best fit real flaw length and flaw depth data, some of which came from an ultrasonic inspection of an LPG storage vessel. The third issue was to select the most appropriate statistical distribution for the ‘Probability of Inclusion’ model, based on the investigation of this study. Conclusions


• The Lognormal distribution and the ‘Log-odds’ distribution were found to be the best fits for the flaw length and flaw depth data in this study, with the Lognormal distribution being the optimum fit, according to the formal statistical tests carried out.

• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the ‘Probability of Inclusion’ model.

Recommendation

• The ‘Probability of Inclusion’ model should be applied more widely than just to LPG storage vessels.

Civ

C1

1. INTRODUCTION

The extent of non-invasive inspection of Liquid Petroleum Gas (LPG) storage vessels has been considered previously by Jacobi Consulting Ltd (JCL) and a probabilistic model (i.e. ‘Probability of Inclusion’) was devised for optimising defect detection. A series of reports and papers were published (1-5) and included a confidential draft document (5), which considered a brief comparison between a modelling approach by Mitsui Babcock Energy Ltd (MBEL) and the modelling approach developed by JCL. A guidelines document (3) was written to assist companies and HSE inspectors to assess how much NDT was required in order to achieve a desired probability of detecting a flaw and was based on a concept called the ‘Index of Detection’. In the meantime, some HSE inspectors have used the JCL work and the guidelines document. In addition, an independent statistician from Toulouse University (France) was employed by JCL as part of a student industrial training programme. A number of tasks and objectives were set for the student, which included a critical review of the earlier JCL model. It was considered timely to assess the experiences of the HSE inspectors, as well as review the JCL ‘Probability of Inclusion’ model and validate it by carrying out a more detailed comparison with the MBEL model. In addition, the various statistical distributions that can be used in the ‘Probability of Inclusion’ model were considered in relation to real flaw data using formal statistical tests, with a view to establishing the distribution that best fits this data and to select the most appropriate distribution to use with the ‘Probability of Inclusion’ model. In this project a critical review is carried out in section 3 along with a report on the experiences of HSE inspectors, who have used the earlier JCL model. In section 4, the confidential draft document (5) is considered in more detail and further explanations are provided about the advantages and disadvantages of the MBEL and JCL approaches. Section 5 deals specifically with the detailed analysis of real data (i.e. flaw length and flaw depth data) and the selection of the most suitable distribution to use in the ‘Probability of Inclusion’ model and ultimately in the ‘Index of Detection’ model (see Appendix D). In the context of LPG storage vessels (Figure 1), the importance of carrying out sufficient non-invasive inspection, cannot be over stated, given the hazardous and inflammable substances contained by them. Disasters can happen and the consequences can be extreme (Figure 2).

2. OBJECTIVES

• To provide a critical review of the ‘Probability of Inclusion’ model • To carry out formal statistical tests on the distributions used in the modelling • To select the most appropriate distribution for the ‘Probability of Inclusion’ model

3. ASSESS PREVIOUS MODELLING WORK

At the start of this study it will have been 5 years since the earlier ‘Probability of Inclusion’ (PoI) model was completed. Since then, HSE inspectors have been considering the PoI model and so their experiences would be helpful to assess the industrial usefulness of the model. In addition, the opportunity was taken during this time to have an independent assessment of the PoI model. An arrangement was made with Toulouse University, France as part of their undergraduate training programme, to employ one of their final year statisticians to carry out an independent review of the earlier PoI model. The student was given the following agreed tasks and objectives:

• Understand the PoI modelling approach by JCL

C2

• Offer a critical review of the JCL modelling approach • Research other possible approaches • Research real data from HSE and other sources • Compare results between the various approaches • Re-assess the JCL PoI model in the light of real data

3.1. A CRITICAL REVIEW

During the review of the JCL PoI model, a number of checks were made. Initially, these were numerical in nature to verify that the earlier calculations were correct and to also assess any glaring errors in the approach. The checks confirmed, as did a verification statement in an earlier report (2), that all the repeated calculations were correct. In addition, the theoretical approach was also found to be correct, given the very limited amount of real LPG data to test the JCL model against. The review pointed out, quite correctly, that apart from some very basic statistical tests, no rigorous formal statistical tests were carried out to assess how well the distributions used were able to fit the data. However, in defence of the earlier work, there was very little real LPG data to carry out formal tests on. Another drawback that was observed in the earlier PoI model was that the probability of including the flaw did not reach 100%, even when 100% inspection was carried out. Whilst there was a good theoretical reason for this, which was fully explained, it was still worth investigating ways of improving the situation. 3.2. DISCUSSIONS WITH HSE INSPECTORS

HSE inspectors felt that the probabilistic models developed by JCL (2, 3), were critically needed at the time because of particular inspection issues the HSE was having with certain sectors of the LPG industry. However, since then there have not been any similar inspection issues in relation to LPG spheres and moreover no one has really challenged the JCL models. Nevertheless, the general feeling within HSE is that this work continues to be of value, as it is offering practical information to industry.

4. THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL

At the time when the PoI model for LPG spheres was being developed by JCL (2, 3), MBEL was also developing a similar model. MBEL have kindly provided the relevant section of this work and it is included here in Appendix CI (Part B). Whilst the JCL and MBEL approaches are quite different on the surface, it is interesting that they produce very similar results. This section considers these two approaches and discusses their advantages and disadvantages. A brief summary of the comparison between the MBEL and JCL model approaches is also provided in Appendix CI (Part A), and the reader can omit the full details of the comparison below, on the first read, and go straight to Appendix CI (Part A). 4.1. ASSESSMENT OF MBEL’S STATISTICAL DEFINITIONS

In the first paragraph of MBEL's document (Appendix CI, Part B), the phrase:

'…a uniform random distribution of defects…' is used. It is felt that the word ‘uniform’ should be avoided here as there are statistical distributions, both discrete and continuous, which are called 'uniform' and this is not what is being considered by the MBEL approach. The complete phrase in the first paragraph has been interpreted as ‘defects’ occur randomly and that all sampled lengths of equal size have the same probability of including a 'defect'.

C3

Page 1 of the MBEL document also refers to '…a unit containing a defect…'

Since the random distribution of defects is expressed as a %, technically one should not be talking about the '…probability that a unit contains a defect…' (i.e. just one defect). The weld volume has been divided into 100 discrete units and the term Dd is the defect distribution expressed as a %. This is equivalent to saying Dd units are defective. The term Pd is believed should be expressed as the 'Probability that any one unit selected at random is defective' or the 'Probability that any 1% of the weld selected at random is defective' and not the 'Probability that a unit contains a defect', since a defective unit may contain more than one defect. The MBEL document uses the notation for the percentage coverage of the weld volume to be 'Cov'. It is felt this notation should also be avoided as this could be initially misunderstood as representing the 'Covariance', another statistical parameter, which is not what is intended. 4.2. SUMMARY OF MBEL'S MODELLING APPROACH

MBEL's approach is based on the following idea and this is illustrated below.

Dd defective units (or Dd % of the weld is defective and randomly distributed)

100 units (or 100% of the weld)

This is equivalent to saying we have one hundred components, Dd of which are defective. The probability that any one unit (or any 1%) selected at random is defective is

d

d

DP

100= (1)

In the following analysis, the notation used by MBEL has been modified slightly for subsequent comparisons with the JCL notation. MBEL have used the above approach to consider the problem of selecting k units (or k%) of the weld and calculating the probability that s (or s%) of these k (or k%) units are defective. MBEL have considered this problem in the context of what in probability is usually termed: (i) 'With replacement' (i.e. the Binomial distribution) (ii) 'Without replacement' (i.e. the Hypergeometric distribution) It is important to note that the random variable X, in the problem considered by MBEL, can only take on integer values 0,1,2,3,……,100 and these integer values represent the % of defective weld and not the number of individual defects. Thus the random variable X is really ‘the % of defective weld’. Consider the following two examples before developing general formulae for the problem considered by MBEL.

C4

Ex.1. A box contains 10 bolts, 3 of which are defective. Two bolts are drawn at random. Find the probability that the two bolts are both defective.

Let the random variable A represents the event ‘the 1st bolt drawn is defective’ Let the random variable B represents the event ‘the 2nd bolt drawn is defective’ P(A) = 3/10 since 3 out of the 10 bolts are defective. If we sample with replacement, the situation before the 2nd drawing is the same as at the beginning and P(B) = 3/10. Assume the events are independent and from the multiplication law of independent events (6). P(A and B) = P(A).P(B) = (0.3)2= 0.09 If we sample without replacement, the situation for A is the same P(A) = 3/10. However the situation for B has changed since we have one less defective bolt after the first selection and P(B) = 2/9 (i.e. 9 bolts left but only 2 are defective) P(A and B) = P(A).P(B) = (3/10)(2/9) ≈ 0.07 The above illustrates the Binomial and Hypergeometric approaches respectively. Consider now a more general example. Ex.2. A box contains N bolts, M of which are defective. If a sample of k bolts is drawn at random find the probability that s bolts are defective.

The probability of selecting one defective bolt at random is given by

M

pN

= (2)

Let the random variable Y represent the event ‘the number of defective bolts’. In drawing a sample of k bolts, with replacement, the probability that precisely s bolts are defective is given by the Binomial distribution (6)

s k s

k

s

M MP Y s 1 s 0 1 2 k

N N( ) ( , , ......, )

!" # $ % $ %

= = ! =& ' ( ) ( )* + * +, -

(3)

Equation (2) can be thought of as a special case of equation (3), sampling one at a time (i.e. k = 1

and with s = 1, note also thatk

s

k

k s s

!

( )! !

! "=# $ %& '

)

Similarly, in drawing a sample of k bolts, without replacement, the probability that precisely s bolts are defective is given by the Hypergeometric distribution (6)

C5

k N M

s k s

N

k

P Y s s 0 1 2 k( ) ( , , ........, )

!

!

" # " #$ % $ %& ' & '

= = =" #$ %& '

(4)

Returning to original problem, the values of N, M, k and s in the latter example can now be thought of as percentages. That is, given k% of the weld is selected find the probability that it contains a defective part (i.e. X > 0). It is felt that this is a more accurate description of the problem than the way it is expressed on page 1 of the MBEL document (i.e.'…the problem is to determine the probability of detecting at least 1 defect when only inspecting a percentage of the weld…')

P X 0 P X 1 1 P X 0( ) ( ) ( )> = ! = " = (5) It is assumed here that X<0 does not have any meaning in the context of the problem here. Let the probability of Including a Defective part (PoI) = P(X > 0) and the probability of Not Including a Defective part (PoN) = P(X = 0) and therefore equation (5) can be re-expressed as

PoI = 1 - PoN (6) 4.2.1. The Binomial Model (With Replacement)

Using equation (3) with s=0 and the probability in equation (2) given by Pd = Dd/100, it can be shown that

0 k 0k

d d

0

k

d

D DPoI 1 1

100 100

DPoI 1 1 0 k 100

100

!" # $ % $ %

= ! !& ' ( ) ( )* + * +, -

$ %. = ! ! / /( )

* +( )

(7)

Equation (7) is the equivalent formula to that given on page 2 of the MBEL document (Appendix CI, Part B). The MBEL approach is equivalent to having 100 bolts of which Dd are defective (i.e. 100 - Dd are not defective) and if we sampled k bolts with replacement, the probability that at least one is defective is selected is given by equation (7). Note that the mean value � is given by:

d

d

kDNp kP

100µ = = = (8)

4.2.2. The Hypergeometric Model (Without Replacement)

Using equation (4), with s=0, N=100 and M=Dd it can be shown that

C6

d

d

d

100 Dk

100 D

0 k 100 D k

d100100

100 k

k

d

PoI 1 k 100 D

PoI 1 k 100 D

!

!

! !

!

" #" #$ %$ %

& ' & '= ! = ( !

" #$ %& '

= > !

( )!

( )!

!

( )!

( )

( )

(9)

This is the equivalent formula to the one used in the MBEL document (page 2) to produce the MBEL graph. Note, that in equation (9) k is used instead of the parameter Cov and Dd is used instead of Pd. The PoI curves for different values of Dd and based on equation (9) are illustrated here in Figure 3. Note that the mean for the Hypergeometric distribution is the same as the Binomial distribution, but their variances are different. This will be considered below. 4.3. A SUMMARY OF JCL’S MODELLING APPROACH

JCL did not consider approaching the problem like MBEL, because it would have meant saying that the defective parts come as 1% units. In practice 1% could be as long as 10m and in general this would be unrealistic. This is not offered as a criticism of the MBEL model but just a statement to say why JCL did not consider modelling the problem this way. It will be shown below that the MBEL and JCL approach derive quite different formulae for the Binomial case but the results are almost identical. In the JCL approach the weld is not divided into 100 discrete units. The weld is considered to have a finite number of defects (L) with different lengths and which are randomly distributed along the weld. This physical approach was adopted because of the way the problem was originally posed to JCL by HSE. The mean amount of defective weld D% is expressed in the following way

1 2 3 Lx x x x

D 100x

.........+ + + +! "= # $% &

(10)

Hence the Probability (of including a defective part if the whole weld is considered) is equivalent to Probability (of including x1 or x2 or x3 or…..xL), which is given by

31 2 Lxx x x D

p which from 10x x x x 100

........ ( )= + + + + = (11)

This is developed more generally by first considering I% of the weld (instead of the whole weld) then rI% of the weld (where r is a counter) and then the probability of including a defective part for

x1 x2 x3

x4 xi xL

x

C7

each rI% selected is established. While this is not difficult to show it is a little long winded and has already been shown in the JCL report (2). Suffice to say that for any particular rI% of the weld, the probability of including a defective part, prI is given by

rI

rDp where nI 100 and 0 r n

n 100.= = ! ! (12)

In the JCL approach a probability is established for including a defective part for each rI% selected. This probability is proportional to the size of selection, unlike the MBEL approach where the units are of equal size (c.f. equation (1)). The value of prI is used in the Binomial expansion to establish the PoI as before (c.f. equations (3), (5) and (6)).

100

rDPoI 1 1 0 r n

n 100

! "= # # $ $% &

' (. (13)

Note that the mean � is given by

rI

rDNp 100 p

nµ = = = (14)

4.4. A COMPARISON OF THE MBEL AND THE JCL APPROACHES

On comparing equations (8) and (14) assuming Dd = D,

k r

100 n= (15)

and the MBEL and JCL expressions for PoI in the Binomial distributions can be compared directly.

k

MBEL

100

JCL 2

DPoI 1 1 and

100

kDPoI 1 1 (0 k 100 )

100

! "= # #$ %

& '

! "= # # ( ($ %

& '

(16)

A graphical comparison between the MBEL and JCL formulae for the PoI is provided in Figure 4. It may or may not seem surprising that these two quite different approaches give such close results. Initial considerations would suggest that there is something too simplistic about the MBEL model because it appears to be assuming that the defective parts come in 1% chunks. However, as the model results show it agrees almost identically with the more direct approach of the physical problem as outlined above for the JCL model. A plausible reason of why the two approaches agree is provided here. The physical distribution of defects in the MBEL approach appears too simplistic at first glance. However, it is as if this is irrelevant because from a modelling point of view the D% defective parts and the (100-D)% non-defective parts, can be grouped together and re-distributed in 1% units along the weld no matter how they appear in practice. The formula for PoI can then be developed as shown above.

C8

4.5. COMMENTS ON THE HYPERGEOMETRIC MODEL

The Hypergeometric model does overcome the 'irritating' problem of inaccuracy at high values of the % weld selection (i.e. large k) for low values of mean defect distribution (i.e. D < 4). The JCL approach of modelling the problem more directly from a physical point of view would not produce any different results by applying the Hypergeometric distribution. This is because for each rI% of the weld selected, the whole rI% is sampled in developing the formula so the issue of with/without replacement does not arise in the JCL approach. If the Hypergeometric approach is adopted, it has to be appreciated that the PoI is the 'probability of including a defective part of at least 1%' and not the 'probability of including at least 1 defect' as stated in the MBEL document. This may or may not be a desirable way of expressing the probability of including a defective part. On the other hand, this does not arise with the Normal distribution as discussed in the JCL report, notwithstanding the difficulties for D < 4. The comparison in this exercise has explained where this difficulty comes from more clearly and provides further independent evidence validating the JCL results, and for that matter validating the MBEL approach as well. The issue of with/without replacement becomes increasingly less important for larger values of the defect distribution D, or smaller values of the weld selection k. The mean for the Binomial and Hypergeometric distributions is the same. However, the variance, σ2, for each distribution is not and is given by

2

Binomial

2

Hypergeometric

Np( 1 p )

( N k )Np( 1 p )

( N 1 )

!

!

= "

"= "

"

(17)

It can be seen from equation (17) that the Hypergeometric parameters approach the Binomial ones for N >> k (i.e. N much larger than k) and for such values the Hypergeometric distribution can be approximated by the Binomial distribution.

5. FURTHER ANALYSIS OF REAL DATA FROM HSE AND OTHER SOURCES

The lack of available inspection data from LPG spheres was part of the difficulty in carrying out rigorous statistical tests in the earlier work (1, 2). However, in order to carry out such tests it is necessary to provide appropriate data. It was decided to re-use the LPG inspection data originally provided by HSE (7), in the absence of other LPG inspection data, which was collected using ultrasonic NDT where both the ultrasonic signal response and the ultrasonically measured flaw lengths are provided. Other data was also researched and it was decided to use data from an earlier HSE funded project (8), which is also ultrasonic NDT data and has accompanying sectioning data. The sectioning data is used here in two forms. The first form of the sectioning dataset is called the ‘full dataset’, where some flaw lengths had to be rounded up or down because the flaw length was given only by an inequality (e.g. sectioning flaw length > 60mm, would have been taken to be 61mm). The second form of the sectioning dataset is called the ‘reduced dataset’, where only the precise flaw length given is included. Thus the reduced dataset is a subset of the full dataset. It was felt useful to include both datasets in order to assess the effects of rounding the sectioning length up or down and the effects of using a relatively smaller dataset.

C9

5.1. DATASETS INVOLVING FLAW LENGTHS

There are four datasets involving flaw lengths and these are given in Appendix CII and are identified as:

i. Case 1: Pi data ii. Case 2: Sectioning data (full) iii. Case 3: Ultrasonic flaw lengths iv. Case 4: Sectioning data (reduced)

Each dataset was analysed in the same way using the statistical software package S-Plus. On entering a dataset in S-Plus, one typical output format is a set of four presentations:

• A histogram • A box plot • A probability density function • A quantile-quantile plot

The four presentations for Case 1 (Pi data) are illustrated in Figure 5a. A histogram is the graphical version of a table which shows what proportion of cases fall into each of several specified categories. The categories are usually specified as non-overlapping intervals of some variable (e.g. flaw length intervals (L) 0 ≤ L <10mm, 10 ≤ L < 20mm etc). A box plot (also known as a box-and-whisker diagram) is a convenient way of illustrating graphically a summary of five statistical numbers, which consists of the smallest observation, the lower quartile, the median, the upper quartile and the largest observation. A probability density function is the continuous equivalent of the histogram representation (for more details see Appendix A). A quantile-quantile (q-q) plot is based on a graphical technique for determining if two data sets come from a population with a common distribution. The q-q plot in Figure 5a shows the Pi flaw lengths plotted against the predicted standardised length based on a Normal distribution, with the mean and standard deviation of the Pi flaw lengths. In order to establish the most appropriate statistical distributions that fit the four data sets in Appendix CII, q-q plots as well as cumulative frequency distribution (cfd) plots (see Appendix A) were initially considered. From the earlier study (1, 2) and more recent research, a number of distributions were analysed. Ultimately four statistical distributions were believed to be the most appropriate to carry out formal statistical tests on. These are:

• The Normal • The Exponential • The Logistic (Log-odds) • The Lognormal

C10

Each of the above distributions was used in the q-q plots and the cfd plots, as a way of providing a visual comparison between the respective distributions and each of the data sets. The q-q plots and cfd plots for Case 1 (Pi data) are illustrated in Figures 5b and 5c respectively. The figures show a degree of agreement with each of the distributions considered, but the cfd plots demonstrate that the Lognormal is perhaps the best fit. Figures 6a, 6b and 6c illustrate Case 2 (full sectioning data) analyses using S-Plus. The q-q plots and the cfd plots of Figures 6b and 6c all show some agreement with the distributions. Here, perhaps it is not absolutely clear which is the best fit, except to say that the cfd Lognormal fit is as good as any of the others. The S-Plus results for Case 3 (Ultrasonic length data) are illustrated in Figures 7a, 7b and 7c and the results for Case 4 (reduced section data) are illustrated in Figures 8a, 8b and 8c. For Case 3 the Lognormal is perhaps the overall best fit. Case 4 should be similar to Case 2, as it is a subset of Case 2. A visual comparison of the results in Figures 6 and 8, shows that the results are indeed very similar. This suggests that the rounding up or rounding down carried out to include more flaw lengths for Case 2 was reasonable, and the smaller data set of Case 4 did not make much difference to the overall results. The plots in Figures 5 to 8 are useful in that they provide clues as to the distribution that best fits the data considered. However, there are formal statistical tests that provide quantitative information for assessing best fit. To test if a distribution is appropriate for a particular data set, parametric or non-parametric tests are used according to the type of the data studied. Parametric tests are based on the fact that the distribution is known a priori. Non-parametric tests are based on the fact that the distribution is not known. For example, if the shape indicates a nearly Normal distribution without outliers, the Student's t tests can be used. If the data contain outliers or are far from Normal, a non-parametric method is used such as the Wilcoxon rank test or the Kolmogorov-Smirnoff test (9). The significance test (10) to be carried out on the four data sets is non-parametric since the distribution for best fit is not known. Here, the null hypothesis H0 is: H0: "The studied distribution is appropriate for the data" and so the alternative hypothesis H1 is:

H1: "The studied distribution is not appropriate for the data" The mean and standard deviation for the four flaw length data sets are given in Table 1 and the Kolmogrov-Smirnoff test was applied. A 95% (or 0.95) confidence level was selected. The test computes a statistical parameter ‘p’ and if the p-value is smaller than 0.05, then the null hypothesis H0 cannot be accepted and alternative hypothesis H1 is accepted. If the p-value is higher than 0.05, then the null hypothesis H0 is accepted and the alternative hypothesis H1 is rejected with an error of 0.05. The results of the Kolmogrov-Smirnoff test on the four data sets are given in Table 2.

C11

From Table 2 the only p-values larger than 0.05, for each of the datasets, are for the Log-odds and Lognormal distributions. Hence, the null hypothesis H0 is accepted for the Logistic and Lognormal distributions, but H0 is rejected for the Normal and Exponential distributions. Moreover, the p-values calculated using the Lognormal distribution are larger than the p-values from the Log-odds distribution. These results serve as quantitative evidence for the distribution of best fit. This is not to say of course that using the other distributions is useless, it just means that the Lognormal is the best fit. Some non-parametric tests are much more appropriate for testing particular distributions. The Shapiro-Wilk test (9) has been developed for the Normal distribution and the results for the four data sets are given in Table 3. The results of Table 3 offer additional confirmation that the Normal distribution, which was used in the earlier JCL model (2), is not the most appropriate distribution for describing flaw length distributions as all the p-values are significantly less than 0.05. 5.2. DATASETS INVOLVING FLAW DEPTHS

The statistical study that was carried out over the lengths of the defects has also been carried out on the flaw depths. Here, only Case 2, Case 3 and Case 4 have flaw depth data. The four S-Plus presentations for Case 2 (Full Sectioning Data) are illustrated in Figure 9a. The q-q plots and cfd plots for Case 2 are illustrated in Figures 9b and 9c respectively. Figure 9b does not show that any one distribution is better than another. Figure 9c shows a degree of agreement with each distribution, but the Exponential and the Lognormal are the best fit. Figures 10a, 10b and 10c illustrate Case 3 (Ultrasonic Flaw Depth Data) analyses using S-Plus. The q-q plots and cfd plots of Figures 10b and 10c all show some agreement with each of the distributions considered; except that the cfd plot of the Exponential distribution is the worst fit. The S-Plus results for Case 4 (Reduced Section Data) are illustrated in Figures 11a, 11b and 11c. Figure 11b shows that the Exponential distribution is probably the best fit. For the other distributions, most of the data follows the trend of the line but with a lot of points not on the line. Figure 11c shows that each distribution exhibits some fit with the data, but the Exponential and the Lognormal are the best fit. Here, the null hypothesis H0 is: H0: "The studied distribution is appropriate for the data" and so the alternative hypothesis H1 is:

H1: "The studied distribution is not appropriate for the data" The mean and standard deviation for the three flaw depth data sets are given in Table 4 and the Kolmogrov-Smirnoff test was applied. As with the flaw length data a 95% confidence level was selected. As before, the test computes a statistical parameter ‘p’ and if the p-value is smaller than 0.05, then the null hypothesis H0 cannot be accepted and alternative hypothesis H1 is accepted. If the p-value is higher than 0.05, then the null hypothesis H0 is accepted and the alternative hypothesis H1 is rejected with an error of 0.05. The results of the Kolmogrov-Smirnoff test on the three data sets are given in Table 5.

C12

From Table 5 the p-values larger than 0.05 for the three datasets, are for the Log-odds and Lognormal distributions, although the p-value for case 2, using the exponential distribution, was also greater than 0.05. Hence, the null hypothesis H0 can be accepted in all cases for the Log-odds and Lognormal distributions. The null hypothesis H0 can also be accepted for case 2, which used the exponential distribution. The p-values calculated using the Lognormal distribution are larger than corresponding the p-values using the Log-odds distribution, and so overall, the Lognormal is accepted as the best fit. On applying the Shapiro-Wilk test (9) to each case, it was found, as with the length data, that the Normal distribution is not the most appropriate distribution for describing flaw depth data. 5.3. REASSESS AND VALIDATE THE MODEL

Despite the fact that the JCL approach for establishing the PoI curves has been discussed in section 4.3 above, the problem as originally stated is given here for completeness and for the analysis that follows. Consider a weld which is D% defective and where the defects are distributed randomly. The problem is: What is the probability of including a defective part, given that I% of the weld is selected, where I% ≤ 100. Consider the random variable X and let it be the event: ‘the % of defective weld included’. The above question can be expressed mathematically in the following way:

What is the P(X>0) when I% of the weld is selected? Clearly, if 100% of the weld is selected then it should follow P(X>0) = 1 (i.e. a certainty), since the weld is known to be D% defective. In order to compute P(X>0), probability theory tells us that

P(X>0) = 1 - P(X ≤ 0)

In this case it does not make sense physically to consider X < 0 (i.e. negative)). So the real problem is to calculate P(X = 0) for the distribution in question. In the work completed so far (2), a given D% is assumed and then for every value of I% (i.e. the amount of weld selected) a unique mean value � is derived, which corresponds to the amount of defective weld contained in I% of the weld selected. It follows that if I% = 100, then � =D. The value of � corresponding to each I% selected (i.e. � I) is substituted in each of the statistical distributions used to calculate P(X = 0) and hence P(X>0). With the previous and current study PoI curves have been calculated using 6 different distributions: 1. Poisson 2. Binomial 3. Hypergeometric 4. Normal 5. Log-odds 6. Lognormal

C13

For the problem as stated above, the first 5 distributions agree very closely and some of the results have been compared in earlier reports (2). The Binomial and the Hypergeometric have been compared in this Appendix, which is important given that the two approaches were quite different and developed independently by two different organisations. The Poisson, Binomial and the Hypergeometric are all discrete distributions and so the probability P(X>0) = 1 - P(X = 0) should really be re-expressed as P(X≥1) = 1 - P(X = 0) since X can only take on the values whole number values 0, 1, 2, 3, 4 etc. The Normal, Log-odds and Lognormal are all continuous distributions and do not ‘suffer’ from the need to re-express P(X>0). However, in each case it is still necessary to compute P(X = 0). In the case of the Log-odds distribution, the computation at P(X = 0) can be done analytically using a formula for the cumulative distribution function F(x), which provides a clear idea of the behaviour at all values of x and in particular at X=0 for different values of D and � I, where � I and D are related by equation (14). F(x) for the Log-odds model is usually expressed as

I

1F( x )

( x )1 exp

b

µ=

! !" #+ $ %

& '

where b is a shape parameter related to the standard deviation (σ) by:

3

b!" =

In the case of the Normal and the Lognormal distributions, and for the sake of efficiency, it is necessary to use statistical packages to compute the results. For example, Microsoft Excel allow the input of X=0 in the statistical function NORMDIST(X, � , � , TRUE) (i.e. the Normal cumulative distribution function) where � and � are the mean and standard deviation of X. However, the Excel statistical function LOGNORMDIST(X, � l, � l), (i.e. the Lognormal cumulative distribution function) where � l, and � l are the mean and standard deviation of lnX respectively, does not allow the input of X=0. In order to analyse the behaviour of the Lognormal distribution at X=0 (i.e. to compute P(X>0), it is necessary to do it numerically with values that gradually get closer to zero. The PoI curves for the cases X=0.02, 0.004 and 0.0008, for example, are illustrated in Figures 12a, 12b and 12c respectively. In truth, Figures 12a, 12b and 12c should be interpreted P(X>0.02), P(X>0.004) and P(X>0.0008) respectively. From Figure 12, the behaviour of the Lognormal distribution suggests that as X� 0, P(X>0) is a certainty, no matter how much of the weld is selected and no matter how much of it is defective. Whilst this seems logical at one level, it is in contrast with the results using the other distributions. In sections 5.1 and 5.2 it has been shown, through the various statistical tests, that the Lognormal distribution was the ‘best’ fit for the distribution of flaw length or flaw depth data, which have come from a variety of data sources (e.g. determined by ultrasonic NDT or determined by sectioning). The Log-odds distribution was the next ‘best’ fit. This of course does not imply that the Lognormal will also be the most appropriate distribution to use in the PoI model, which is really a different problem to fitting a statistical distribution to a set of flaw data. Moreover, the behaviour of the Lognormal at X=0 suggests that it is not the most appropriate distribution to use for calculating the

C14

PoI curves, particularly as the PoI curves of concern here are defined by P(X>0) = 1 – P(X=0). However, at larger values of X, the behaviour of the Lognormal does seem appropriate (e.g. P(X>1 = 1 – P(X≤1)) The next ‘best’ fit to the flaw data was the Log-odds distribution and this has a number of advantages in the context of the PoI curves. The Log-odds distribution has an analytic function to describe its behaviour at all values of X. It is continuous, so the values of X are not limited to discrete jumps. The values of the mean and standard deviation are not limited by some relationship, as in the case of the Poison and Binomial and hence allow for more general cases. The PoI curves using the Log-odds distribution agree very well with the PoI curves using the Poison, Binomial, Hypergeometric and the Normal. A comparison of the PoI curves for P(X>0), using the Normal and Log-odds distributions are illustrated in Figures 13a and 13b respectively. 5.4. SELECT THE BEST AVAILABLE MODEL

From Figure 13, the difference between the PoI curves calculated using the Log-odds and the Normal distributions is pretty negligible. However, it is recommended that the Log-odds distribution is used to compute the PoI curves, mainly for the reasons stated above and also that the Log-odds distribution has been shown to be a better statistical fit to the distribution of flaw lengths and flaw depths, than the Normal distribution.

6. CONCLUSIONS


• The Lognormal distribution and the ‘Log-odds’ distribution were found to be the best fits for the flaw length and flaw depth data in this study, with the Lognormal distribution being the optimum fit, according to the formal statistical tests carried out.

• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the ‘Probability of Inclusion’ model.

7. RECOMMENDATIONS

• The ‘Probability of Inclusion’ model should be applied more widely than just to LPG storage vessels.

8. ACKNOWLEDGEMENTS

The authors would like to thank the HSE for funding this study and in particular to Graeme Hughes for his useful comments and advice throughout.

9. REFERENCES

1. Georgiou G A: ‘The Extent of Ultrasonic Non-Invasive Inspection of LPG Storage Vessels’. HSE Project, JCL Report No. 2/8/99, (September 1999, Revision 1).

2. Georgiou G A: ‘Probabilistic Models for Optimising Defect Detection in LPG Storage Vessels’. HSE Project, JCL Report No. 3/3/00 (June 2000)

3. Georgiou G A: ‘Proposed Guidelines for Estimating the Extent of Manual Ultrasonic NDT for LPG Storage Vessels’. HSE Project, JCL Report No. 4/3/00 (July 2000)

4. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG welds’. Proceedings of BINDT, September 2000

5. Georgiou G A: ‘Probabilistic Methods for Optimising defects Detection’ (A comparison of Mitsui Babcock Energy Ltd (MBEL) approach and Jacobi Consulting Ltd approach), Confidential draft document for HSE (June 2000).

C15

6. Kreyszig, E: ‘Advanced engineering mathematics’, John Wiley & Sons, Inc. 1983 (5th Edition, pp 922-927).

7. Pi report: ‘Sample inspections of welds on LPG spheres’ (Report Ref. IC 0041/1/98 December 1998).

8. Georgiou G A: ‘Adopting European ultrasonic standards for high quality fabrications: Implications for manufacturers and end users’. A TWI report (No. 5657/10/95) December 1995.

9. The Kolmogrov-Smirnoff Test: http://en.wikipedia.org/wiki/Kolmogorov_Smirnoff_Test 10. Crawshaw J and Chambers J: ‘A concise course in A level Statistics’, Second Edition, Stanley

Thornes (Publishers) Ltd, 1992.

C16

Table 1 The mean and standard deviation for the four flaw length data sets

Case 1 Case 2 Case 3 Case 4

Mean 22.18966 27.09 33.26 26.75

Standard Deviation 13.37772 20.02949 24.47578 16.91343

Table 2 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each data set against

each distribution

Sample Normal Exponential Logistic Lognormal

Case 1 0.0001 0 0.0526 0.4391

Case 2 0.0179 0.0217 0.162 0.3706

Case 3 0.0005 0.0133 0.1063 0.9122

Case 4 0.0126 0.0034 0.142 0.3223

Table 3 The ‘p-value’ calculated using the Shapiro-Wilks test for each data set against the

Normal distribution

Sample Normal

Case 1 2.220446e-016

Case 2 7.333356e-011

Case 3 0

Case 4 0

Table 4 The mean and standard deviation for the three flaw depth data sets

Case 2 Case 3 Case 4

Mean 4.44 8.2 4.207

Standard Deviation 3.663109 4.956123 3.629328

Table 5 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each flaw depth data

set against each distribution

Sample Normal Exponential Logistic Lognormal

Case 2 0.0093 0.3628 0.191 0.6419

Case 3 0.5 0.0052 0.5661 0.9381

Case 4 0.0113 0.5 0.1342 0.8328

C17

Table 6 The ‘p-value’ calculated using the Shapiro-Wilks test for each flaw depth data set against the Normal distribution

Sample Normal

Case 2 1.291189e-013

Case 3 0

Case 4 0

Appendix C

Figure 1 An illustration of a Liquid Petroleum Gas (LPG) sphere

Appendix C

Figure 2 An illustration of a collapsed LPG Sphere

Appendix C

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100% Inspection (k)

Pro

ba

bil

ity

PO

I (M

BE

L's

'S

imp

le'

Mo

del)

Defective Weld D = 1%






Hypergeometric Model

)100(1

)100(1

)!100(

!100

)!100(

)!100(

100

100

0

d

d

k

kD

D

k

D

k

k

DkPOI

DkPOI d

d

d

!>=

!"=

##

$

%

&&

'

(

##

$

%

&&

'

(

##

$

%

&&

'

(

!=

!

!!

!

!

Figure 3 Probability of including a defective part of at least 1% given a certain % level of inspection (Hyper geometric model)

Appendix C

Figure 4 Probability of including a defective part of at least 1% given a certain % level of inspection (Comparison of JCL and MBEL Models)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Inspection (k)

PO

I (B

ino

mia

l)

MBEL Model D=1%

JCL Model D=1%

MBEL Model D=4%

JCL Model D=4%

MBEL Model D=10%

JCL Model D=10%

MBEL Model D=40%

JCL Model D=40%

)1000(100

11

10011

100

2!!"#

$%&

'((=

"#

$%&

'((=

kkD

POI

andD

POI

JCL

k

MBEL

Appendix C

Figure 5a A typical analysis and presentation of Case 1 (Pi data) using S-Plus

0 20 40 60 80

05

10

15

20

25

(a) Data presentation in form of histogram

Lengths of defects

Frequencies

20

40

60

80

(b) Data presentation in form of boxplot

Lengths of defects

0 20 40 60 80 100

Lengths of defects

0.0

00.0

10.0

20.0

3

Pro

babili

ty

(c) Data density presentation

Longueurs de defautsProbabilite

-2 -1 0 1 2

Quantiles of Standard Normal

20

40

60

80

Length

s of defe

cts

(d) Quantile-Quantile

Appendix C

Figure 5b Quantile-Quantile plots of Case 1 (Pi data) against

(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Figure 5c Cumulative density function plots of Case 1 (Pi data) against


20 40 60 80

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

Longueur des defautsProbabilite

(a) Normal distribution

20 40 60 80

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

(b) Exponential distribution

20 40 60 80

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

(c) Logistic distribution

20 40 60 80

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

(d) Lognormal distribution

-2 -1 0 1 2

Theoretical quantiles

2040

6080

Em

piric

al q

uant

iles


0 1 2 3 4


2040

6080

Em

piric

al q

uant

iles


-4 -2 0 2 4


2040

6080

Em

piric

al q

uant

iles


0 2 4 6 8 10


2040

6080

Em

piric

al q

uant

iles


Appendix C

Figure 6a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus

0 20 40 60 80

02

46

810

12


Lengths of defects

Frequencies

020

4060

80


Lengths of defects

-20 0 20 40 60 80 100

Lengths of defects

0.00

00.

005

0.01

00.

015

0.02

0

Pro

babi

lity



-2 -1 0 1 2


020

4060

80

Leng

ths

of d

efec

ts


- 2 - 1 0 1 2

T h e o r e t i c a l q u a n t i l e s

020

40

60

80

Em

piric

al quantile

s

( a ) N o r m a l d i s t r i b u t i o n

0 1 2 3 4


020

40

60

80

Em

piric

al quantile

s

( b ) E x p o n e n t i a l d i s t r i b u t i o n

- 4 - 2 0 2 4


020

40

60

80

Em

piric

al qu

antile

s

( c ) L o g i s t i c d i s t r i b u t i o n

0 2 4 6 8 1 0


020

40

60

80

Em

piric

al qu

antile

s

( d ) L o g n o r m a l d i s t r i b u t i o n

Figure 6b Quantile-Quantile plots of Case 2 (full sectioning data) against

Appendix C

Figure 6c Cumulative density function plots of Case 2 (full sectioning data) against


0 2 0 4 0 6 0 8 0

L e n g th s o f d e fe c ts

0.0

0.2

0.4

0.6

0.8

1.0

Pro

bability

(a ) N o rm a l d is tr ib u tio n

0 2 0 4 0 6 0 8 0


0.0

0.2

0.4

0.6

0.8

1.0

Pro

bability

(b ) E xp o n e n tia l d is tr ib u tio n

0 2 0 4 0 6 0 8 0


0.0

0.2

0.4

0.6

0.8

1.0

Pro

bability

(c ) L o g is t ic d is tr ib u tio n

0 2 0 4 0 6 0 8 0


0.0

0.2

0.4

0.6

0.8

1.0

Pro

bability

(d ) L o g n o rm a l d is tr ib u tio n


Appendix C

Figure 7a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus

0 20 40 60 80 100 120

05

1015

20

(a) Data presentation in the form of histogram

Length of defects

Frequencies

2040

6080

100

(b) Data presentation in the form of boxplot

Length of defects

0 50 100

Length of defects

0.00

00.

005

0.01

00.

015

0.02

0

Pro

babi

litie

s



-2 -1 0 1 2


2040

6080

100

Leng

th o

f def

ects


Appendix C

Figure 7b Quantile-Quantile plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Figure 7c Cumulative density function plots of Case 3 (ultrasonic data) against


-2 -1 0 1 2


2040

6080

100

Em

piric

al q

uant

iles


0 1 2 3 4


2040

6080

100

Em

piric

al q

uant

iles


-4 -2 0 2 4


2040

6080

100

Em

piric

al q

uant

iles


0 2 4 6 8 10


2040

6080

100

Em

piric

al q

uant

iles


20 40 60 80 100

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity



20 40 60 80 100

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


20 40 60 80 100

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


20 40 60 80 100

Lengths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


Appendix C

Figure 8a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus

0 2 0 4 0 6 0 8 0

02

46

81

01

21

4(a ) Da ta pre s e nta tion in the form of his togra m

L e n g th o f d e fe cts

Fre q u e n cie s

20

40

60

(b) Da ta pre s e nta tion in the form of box plot


0 20 40 60 80


0.0

00

0.0

05

0.0

10

0.0

15

0.0

20

0.0

25

Pro

ba

bilit

ies

(c ) Da ta de ns ity pre s e nta tion

L o n g u e u rs d e d e fa u tsP ro b a b ilite

-2 -1 0 1 2

Q u a n tile s o f S ta n d a rd No rm a l

20

40

60

Le

ng

th o

f d

efe

cts

(d) Q ua ntile -Q ua ntile

Appendix C

Figure 8b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against


Figure 8c Cumulative density function plots of Case 4 (reduced sectioning data) against (a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

-2 -1 0 1 2

T heoretic a l quantiles

2040

60

Em

piric

al q

uantil

es

( a ) N o rm a l d is tr ib u tio n

0 1 2 3 4


2040

60

Em

piric

al q

uantil

es

( b ) E x p o n e n tia l d is tr ib u tio n

-4 -2 0 2 4


2040

60

Em

piric

al q

uantil

es

( c ) L o g is tic d is tr ib u tio n

0 2 4 6 8 10


2040

60

Em

piric

al q

uantil

es

( d ) L o g n o rm a l d is tr ib u tio n

20 40 60

Lengths of defec ts

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babili

ty

Longueur des defauts

P robabilite

(a ) N o rm a l d is tr ib u tio n

20 40 60

Lengths of defec ts

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babili

ty

( b ) E x p o n e n tia l d is tr ib u tio n

20 40 60

Lengths of defec ts

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babili

ty

( c ) L o g is tic d is tr ib u tio n

20 40 60

Lengths of defec ts

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babili

ty

( d ) L o g n o rm a l d is tr ib u tio n

Appendix C

Figure 9a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus

0 5 10 15

05

1015


Depth of defects

Frequencies

02

46

810

1214


Depth of defects

0 5 10 15

Depth of defects

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Pro

babi

litie

s



-2 -1 0 1 2


02

46

810

1214

Dep

th o

f def

ects


Appendix C

Figure 9b Quantile-Quantile plots of Case 2 (full sectioning data) against


Figure 9c Cumulative density function plots of Case 2 (full sectioning data) against (a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

-2 -1 0 1 2


02

46

810

1214

Em

piric

al q

uant

iles


0 1 2 3 4


02

46

810

1214

Em

piric

al q

uant

iles


-4 -2 0 2 4


02

46

810

1214

Em

piric

al q

uant

iles


0 2 4 6 8 10


02

46

810

1214

Em

piric

al q

uant

iles


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity



0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


Appendix C

Figure 10a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus

0 5 10 15 20 25

02

46

810


Depth of defects

Frequencies

510

1520

25


Depth of defects

0 10 20 30

Depth of defects

0.00

0.02

0.04

0.06

0.08

0.10

Pro

babi

litie

s

(c) Density data presentation


-2 -1 0 1 2


510

1520

25

Dep

th o

f def

ects


Appendix C

Figure 10b Quantile-Quantile plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Figure 10c Cumulative density function plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

-2 -1 0 1 2


510

1520

25

Em

piric

al q

uant

iles


0 1 2 3 4


510

1520

25

Em

piric

al q

uant

iles


-4 -2 0 2 4


510

1520

25

Em

piric

al q

uant

iles


0 2 4 6 8 10


510

1520

25

Em

piric

al q

uant

iles


5 10 15 20 25

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity



5 10 15 20 25

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


5 10 15 20 25

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


5 10 15 20 25

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


Appendix C

Figure 11a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus

0 5 10 15

05

1015

20


Depths of defects

Frequencies

02

46

810

1214


Depths of defects

0 5 10 15

Depths of defects

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Pro

babi

lity



-2 -1 0 1 2


02

46

810

1214

Dep

ths

of d

efec

ts


Appendix C

Figure 11b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against


Figure 11c Cumulative density function plots of Case 4 (reduced sectioning data) against


-2 -1 0 1 2


05

1015

Em

piric

al q

uant

iles


0 1 2 3 4


05

1015

Em

piric

al q

uant

iles


-4 -2 0 2 4


05

1015

Em

piric

al q

uant

iles


0 2 4 6 8 10


05

1015

Em

piric

al q

uant

iles


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity



0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


0 5 10 15

Depths of defects

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity


Appendix C

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Inspection (I%)

Pro

bab

ilit

y o

f In

clu

sio

n (

Lo

gn

orm

al)

1% Defective Weld

2% Defective Weld

3% Defective Weld

4% Defective Weld

5% Defective Weld

6% defective Weld

8% Defective Weld

10% Defective Weld

P(X>0) computed at X=1, that is,

P(X>0) = 1 - P(X=0.02)

(a)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Inspection (I%)

Pro

bab

ilit

y o

f In

clu

sio

n (

Lo

gn

orm

al)

1% Defective Weld

2% Defective Weld

3% Defective Weld

4% Defective Weld

5% Defective Weld

6% defective Weld

8% Defective Weld

10% Defective Weld


P(X>0) = 1 - P(X=0.004)

(b)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Inspection (I%)

Pro

bab

ilit

y o

f In

clu

sio

n (

Lo

gn

orm

al)

1% Defective Weld

2% Defective Weld

3% Defective Weld

4% Defective Weld

5% Defective Weld

6% defective Weld

8% Defective Weld

10% Defective Weld


P(X>0) = 1 - P(X=0.0008)

(c)

Figure 12 Estimating the probability of including a defective part (P(X>0) = 1-P(X=0)) using

the Lognormal distribution given a certain % level of inspection for the cases (a) X=0.02, (b) X=0.004 and (c) X=0.0008

Appendix C

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100% Inspection (I % )

Pro

ba

bil

ity

(N

orm

al)

1% Defective Weld

2% Defective Weld

3% Defective Weld

4% Defective Weld

5% Defective Weld

6% Defective Weld

8% defective Weld

10% Defective Weld

Normal Distribution with

variance = mean

for comparison with Poisson

(a)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Inspection ( I%)

Pro

ba

bil

ity

(L

og

isti

c)

1% Defective Weld

2% Defective Weld

3% Defective Weld

4% defective Weld

5% Defective Weld

6% Defective Weld

8% Defective Weld

10% Defective Weld

Logistic Distribution with

variance = mean

for comparison with Poisson

(b)

Figure 13 The probability of including a defective part (P(X>0) = 1-P(X=0)) using

(a) the Normal distribution and (b) the Log-odds distribution

CI 1

APPENDIX CI

PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL ‘PROBABILITY OF INCLUSION’ MODELS

PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL

(Kindly provided by B Shepherd, MBEL November 2005)

CI 2

PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL PROBABILITY OF INCLUSION MODELS

1. GENERAL POINTS ABOUT MBEL DEFINITIONS

(a) The word 'uniform' on page 1 should be avoided as it stands for a specific distribution in statistics. (b) Pd is not the 'Probability that a unit contains a defect' but the 'Probability that any one unit selected

at random is defective' or the 'Probability that any 1% of the weld selected at random is defective'. (c) The notation for the amount of weld selected 'Cov' should be avoided as this could be initially

misunderstood as representing the 'Covariance', another statistical parameter. 2. SUMMARY OF MBEL'S AND JCL’S MODELLING APPROACHES

In essence the MBEL approach divides the weld into one hundred discrete 1% units, Dd% of which is defective and randomly distributed. The problem considered is, if k% of the weld is selected find the probability it includes a defective part (a) with replacement (Binomial), (b) without replacement (Hypergeometric). In the JCL approach the weld is not divided into one hundred discrete units. The weld is considered to have a finite number of defects (L) with different lengths and which are randomly distributed along the weld, resulting in a mean amount of defective weld D%. A probability is established for including a defective part for each rI% selected. This probability is proportional to the size of selection, unlike the MBEL approach where the units are of equal size. This result is then fed into the different distributions considered and for all possible rI% selections. 3. A COMPARISON OF THE MBEL APPROACH AND THE JCL APPROACH FOR

THE BINOMIAL

The two different formulae derived using two different approaches are:

k

MBEL

100

JCL 2

DPOI 1 1 and

100

kDPOI 1 1 (0 k 100 )

100

! "= # #$ %

& '

! "= # # ( ($ %

& '

(1)

The comparison using equation 1 (c.f. Figure 4, Appendix C) shows the results are almost identical. 4. COMMENTS ON THE HYPERGEOMETRIC MODEL

The Hypergeometric model does overcome the 'irritating' problem of inaccuracy at high values of the % weld selection (i.e. large k) and for low values of mean defect distribution (i.e. D < 4). If the Hypergeometric approach is adopted, it has to be appreciated that the PoI is the 'probability of including a defective part of at least 1%' and not the 'probability of including at least 1 defect' as stated in the MBEL document. The 'probability of including a defective part of at least 1%' may not be the most desirable way of expressing the probability of including a defective part.

On the other hand, this is not an issue with the Normal or Log-odds distributions as discussed in Appendix C, notwithstanding the difficulties for D < 4. The comparison in this exercise has explained where this difficulty comes from more clearly and provides further independent evidence validating the JCL results and for that matter the MBEL results.

CI 3

PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL PROBABILITY OF DETECTING AT LEAST ONE DEFECT AS A FUNCTION OF WELD DEFECTIVENESS AND SAMPLE SIZE.

Description of Problem:

Assume a weld has a uniform random distribution of defects within its length. This random distribution can be represented as a percentage of defective weld structure. If a defect will be detected when the region it is in is inspected (assumes 100% detection capability), the problem is to determine the probability of detecting at least 1 defect when only inspecting a percentage of the weld volume (and therefore e.g. being alerted to the fact that a particular degradation mechanism is active)

POI : Probability of Inclusion of a Defect

Cov : Percentage Coverage of the Weld Volume.

Dd : Defect Distribution as a Percentage

Assumptions:

To assist in calculating the POI the following assumptions are made:

The weld volume can be divided into 100 discrete units.

Probability that a unit contains a defect:

Pd = Dd / 100%.

Probability that a unit contains no defect:

P0 = 1 - Pd

Probability of Inclusion:

POI = 1-PON

Where

PON : Probability of Detecting No Defects

Methods:

Using the above definitions and assumptions a number of models could be used to determine the POI:

CI 4

Simple Evaluation:

Using the following data evaluation, a formula can be determined and extrapolated to 100% coverage.

Pd % 1 2 3 4

P0 % 99 98 97 96

Cov %

PON PON PON PON

0 1 1 1 1

1

100

99

100

98

100

97

100

96

2

99

98

100

99!

99

97

100

98!

99

96

100

97!

99

95

100

96!

3

98

97

99

98

100

99!!

98

96

99

97

100

98!!

98

95

99

96

100

97!!

98

94

99

95

100

96!!

4

97

96

98

97

99

98

100

99!!!

97

95

98

96

99

97

100

98!!!

97

94

98

95

99

96

100

97!!!

97

93

98

94

99

95

100

96!!!

General Formula

)!Cov100(

!100

)!CovP100(

)!P100(

PON d

d

!

!!

!

=

Binomial Distribution:

r

d

rn

0 PP!r)!rn(

!n1PON1POI !!""

#

$%%&

'

((=(=

(

Where:

n = Cov

r = 0 defects to detect

By altering n from 0 to 100 % coverage the POI rises from 0 to 1 in a curve.

CI 5

Poisson Distribution:

!!"

#$$%

& µ'='=

µ'

!r

e1PON1POI

r

where

µ = nPd

n = Cov

r = 0 defects to detect

Graphs for all three analysis methods are presented.

Generally all three graphs exhibit similar trends.

The binomial and Poisson relationship do not provide reliable results for high coverage of welds with a low defect distribution, since the graphs do not predict a probability of 1 at 100% coverage of the weld with 4% uniform defect distribution.

A more rigorous treatment, which also addresses the probability of detection (POD) of the inspection method, is provided in the reference below.

Reference

Probabilistic Models for Optimising Defect Detection in LPG Welds G. A. Georgiou, Proceedings of the British Institute of NDT Conference 2000, pp 168 - 173

CI 6

Probability of Detecting 1 Defect Given a Percentage coverage of a Defective Weld.

Simple Distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

Percentage Coverage

Pro

ba

bil

ity

of D

ete

ctin

g a

De

fe

ct

4% defective weld (uniform Distribution)



CI 7


Using Binomial Distribution Model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

Percentage Coverage

Pro

bab

ilit

y o

f D

etectin

g a

defect

Binomial dist for 4% defective weld



CI 8


Using Poisson Distribution Model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

Percentage Coverage

Pro

bab

ilit

y o

f D

etectin

g a

defect

Poisson dist for 4% defective weld



CII 1

APPENDIX CII

THE DATASETS USED IN THE STATISTICAL ANALYSES IN APPENDIX C

CII 2

Original Length Section Flaw Section Depth Flaw Length Depth Flaw Length Depth

Reference (mm) Ident. Identification Length (mm) (mm) type mm mm type mm mm

B1-1-IP 35 IP 10 CP 10 25 5 CP 6 23 NA CP2 2.6

B1-2-IP 27 IP 8 Crk 1 38 4.1 CP10 5.5 CP 6 13 0

B1-3-IP 40 IP 9 Crk 2 38 5.7 Crk 1 30 NA CP10 5

B1-4-IP 25 IL 6 Crk 4 39 2.6 Crk 2 33 NA Crk 1 28 4.1

B1-5-IP 20 IP 6 Crk 5 48 4 Crk 4 40 NA Crk 2 38 5.7

B1-6-IP 35 LRF 1 Crk 6 43 1 Crk 5 60 3.5 Crk 4 39 2.6

B2-1-IP 35 W 2 Crk 7 38 5.4 Crk 6 57 <3 Crk 5 48 4

B2-3-IP 70 W 3 IL 1 47 3.4 Crk 7 35 <3 Crk 6 43 1

B2-5-IP 40 W 4 IL 2 38 3.5 IL 1 52 9 Crk 7 38 5.4

B2-6-IP 20 W 1 IL 6 4 1.9 IL 2 41 3 IL 1 47 3.4

B2-7-IP 80 CP 2 IL 7 16 13.5 IL 6 13 <3 IL 2 38 3.5

B3-1-IP 40 CP 8 IP 1 10 15 IL 7 31.5 4 IL 6 4 1.9

B3-2-IP 27 IP 5 IP 2 61 1 IP 1 23 <3 IL 7 16 13.5

B3-3-IP 20 IP 1 IP 3 14 8.4 IP 2 41 3.5 IL 10 12 0.7

B3-4-IP 25 IP 4 IP 4 10 9.8 IP 3 8 NA IP 1 10 15

B3-5-IP 20 PL 3 IP 5 9 0.6 IP 4 23 NA IP 2 1

B3-6-IP 30 CP 7 IP 7 31 1.3 IP 5 14 NA IP 3 14 8.4

B3-7-IP 32 IL 3 LSF 1 67 3.5 IP 7 18 <3 IP 4 10 9.8

B3-9-IP 28 LSF 7 LSF 2 56 5.5 LSF 1 87 10 IP 5 0.6

B3-10-IP 23 IL 10 LSF 3 44 8.5 LSF 2 65 13 IP 7 1.3

B4-1-IP 15 PL 2 LSF 4 46 8 LSF 3 71 7 LSF 1 63 3.5

B4-2-IP 15 CP 3 LSF 5 17 4.8 LSF 4 93 25 LSF 2 56 5.5

B4-3-IP 20 CP 6 LSF 6 16 10.5 LSF 5 27 <3 LSF 3 44 8.5

B4-4-IP 25 LSF 8 LSF 7 11 5 LSF 6 27.5 11.5 LSF 4 46 8

B4-5-IP 10 IP 3 LSF 8 13 4.1 LSF 7 107 12 LSF 5 17 4.8

B5-1-MK 15 CP 1 Pl 1 13 8.1 LSF 8 15.5 NA LSF 6 16 10.5

B5-3-MK 12 CP 5 Pl 3 36 2 Pl 1 31.5 10.5 LSF 7 11 5

B5-4-MK 10 CP 4 Pl 4 17 1.7 Pl 3 25 6 LSF 8 13 4.1

B5-5-MK 10 IL 7 Pl 5 36 1.7 Pl 4 24 4 PL 1 1.9

B5-6-MK 10 LSF 6 Pl 7 25 3 Pl 5 20 8 Pl 1 13 8.1

B5-8-MK 19 LSF 5 Pl 8 19 5.5 Pl 7 39 7 Pl 3 36 2

B5-9-MK 25 IL 5 Pl 9 16 13.2 Pl 8 27.5 7.5 Pl 4 17 1.7

B5-10-MK 12 IL 9 Ps 2 6 1.8 Pl 9 21.5 6 Pl 5 36 1.7

B5-11-MK 25 LSF 10 Ps 3 9 0.9 Ps 2 16.5 NA Pl 7 25 3

B6-1-MK 9 IL 4 Ps 5 11 0.7 Ps 3 13 NA Pl 8 19 5.5

B6-2-MK 22 LSF 9 Ps 6 23 1.1 Ps 5 8 NA Pl 9 16 13.2

B6-3-MK 5 CP 10 Ps 7 74 5.5 Ps 6 8 NA Ps 2 6 1.8

B6-4-MK 25 CP 9 Ps 9 20 1.9 Ps 7 88 8 Ps 3 0.9

B6-7-MK 20 Crk 1 Ps 10 27 5.3 Ps 9 11.5 NA Ps 5 0.7

B6-8-MK 18 IP 7 Pt 2 2 0.4 Ps 10 25 NA Ps 6 23 1.1

B7-1-MK 20 IL 8 Pt 3 2 0.7 Pt 2 10 NA Ps 7 74 5.5

B7-2-MK 20 Crk 2 Pt 4 2 0.4 Pt 3 8.5 NA Ps 9 20 1.9

B7-4-MK 15 Crk 7 Th 2 38 3.2 Pt 4 8.5 NA Ps 10 27 5.3

B7-6-MK 10 IL 2 Th 3 28 1.9 Th 2 38 <3 Pt 2 0.4

B7-7-MK 10 Crk 4 Th 4 9.5 0.7 Th 3 35 <3 Pt 3 0.7

B7-8-MK 12 Crk 6 Th 5 39 3.3 Th 4 10 NA Pt 4 0.4

B7-9-MK 12 LSF 3 Th 6 20 4.6 Th 5 31 <3 Th 2 38 3.2

B7-10-MK 12 LSF 4 Th 7 10 10.8 Th 6 32 <3 Th 3 28 1.9

E1-1-IP 35 IL 1 Th 9 3 1.6 Th 7 18 NA Th 4 9.5 0.7

E1-2-IP 15 Crk 5 Th 10 90 5.9 Th 9 6 NA Th 5 39 3.3

E1-3-IP 25 Th 10 72 <3 Th 6 20 4.6

E3-2-IP 20 Th 7 10 10.8

E3-4-IP 10 Th 9 3 1.6

E4-1-MK 12 Th 10 5.9

E4-2-MK 10

E4-3-MK 15

E4-6-MK 20

T3-1-MK 20

Total = 1287.00 Total = 1354.50 222.00 Total = 1663.00 164.00 Total = 1123.50 227.20

Mean = 22.19 Mean = 27.09 4.44 Mean = 33.26 8.20 Mean = 26.75 4.21

SD = 13.38 SD = 20.03 3.66 SD = 24.48 4.96 SD = 16.91 3.63

Case 1 (Pi data, 58 UT data)

See Pi report IC 0041/1//98

Appendix C

Case 2 (TWI, 50 sectioning data)

See GSP 5657/10/95

Tables B2 & B3 corresponding to 50 UT lengths and depths

Case 4 (TWI, 42 sectioning data)

See GSP 5657/10/95

Tables B2 & B3 corresponding to 42 UT lengths and 54 UT depths

Case 3 (TWI, 50 UT data)

See GSP 5657/10/95

Tables B2 & B3, 50 UT lengths and depths

APPENDIX D

PROPOSED GUIDELINES FOR ESTIMATING THE

EXTENT OF NDT FOR WELDS

Proposed Guidelines for Estimating the Extent of NDT for Welds

TABLE OF CONTENTS

1 INTRODUCTION AND BACKGROUND D1

2 PROCEDURE D1

3 DECISION TREES D2

3.1 PROBABILITY OF INCLUSION CURVES D2

3.2 PROBABILITY OF DETECTION CURVES D3

3.3 INDEX OF DETECTION D3

4 REFERENCES D3

Figures 1 – 4 APPENDIX DI EXAMPLES FOR THE USING THE DECISION TREES (FIGURES

1 AND 2) FOR A PARTICULAR NDT METHOD

1 INTRODUCTION AND BACKGROUND

Just over 5 years ago HSE had reason to investigate the occurrence of cracking in Liquid Petroleum Gas (LPG) storage vessels. As a result of these investigations and obvious concerns, there was a requirement to carry out manual ultrasonic NDT to ensure that the structural integrity of the vessels was not compromised. An important question for the operators of LPG vessels was how much manual ultrasonic NDT was necessary, considering the significant outlay costs involved as well as loss of production? Also, in the absence of useful and comprehensive ultrasonic inspection data, the problem of estimating how much NDT was required, and moreover where to inspect is not straightforward. In order to assist the LPG industry at the time, HSE funded research work to develop ‘Probability of Inclusion’ (PoI) models, based on well established statistical theoretical approaches. The PoI models could be used to estimate the level of manual ultrasonic NDT required to:

(i) include a defective part of the weld (ii) to detect the defective part.

In that early work, the PoI models were developed specifically for LPG storage vessels in the context of manual ultrasonic NDT. However, this early work has been updated and in this appendix, which is a complement document to Appendix C, the application of the PoI models is considered to be wider than just the ultrasonic NDT of LPG storage vessels. The PoI models have been adapted for welds in general and the user may need to substitute their own PoI curves and their own Probability of Detection (PoD) curves appropriate to the particular NDT method of interest. The PoI modelling work assumes at the outset that D% of the total length of weld is defective, (i.e. D > 0) and randomly distributed. The method of establishing the value of D will be up to the user, but could be based on some sample inspection. The final PoI model results presented here have been compared with a number of other statistical approaches and shown to be valid for D 4%, which is believed to be a reasonable lower bound for industrial applications (and in the light of the investigations highlighted above). The conclusions and results of the PoI modelling work in Appendix C form the basis of these proposed guidelines. 2 PROCEDURE

In developing a procedure for industry to estimate the % level NDT required, a number of key issues have been considered and these are highlighted below. • Has any NDT been carried out already?

• Have the inspected areas been targeted or have they been selected at random?

• What are the critical flaw sizes of concern for the particular weld?

• Have any significant flaws been detected?

Decision trees and graphs are provided to help estimate the % level of NDT necessary to

include as well as to detect a defective part of the total weld length, to a required probability. This is considered for whatever % level (if any) of NDT was carried out previously.

D1

D2

It is believed that flaws are more likely to occur in certain areas of welds than others, as there is evidence of this in industrial data (e.g. horizontal seam welds in LPG vessels). In recognising this, if the NDT carried out on selected parts of the weld include such critical or 'targeted' areas, a weighting factor (w) is applied to the level of NDT already carried out. This increases the equivalent level of NDT and, in turn, increases the probability of including and detecting a defective part of the weld. The weighting factor (w) is set at 1.5.

It is assumed that the operator is able to calculate the critical flaw size (i.e. height) using fracture mechanics methods. A general principle is adopted that if at any point in the inspection a significant flaw is detected, then 100% NDT is carried out followed by procedures for remedial repairs and checks to remove the flaws. 3 DECISION TREES

The % level of NDT previously carried out (I%), falls into three categories: (i) NO NDT (I = 0%), in which case some guidance is provided to estimate a sufficient

level of inspection (decision tree, Figure 1). (ii) SOME NDT (0% < I < 100%), in which case some guidance is provided to assess

whether the level of NDT is sufficient or whether an additional amount is necessary (decision tree, Figure 2).

(iii) MAXIMUM NDT (I = 100%), in which case if any significant flaws are detected, it is assumed that procedures apply for remedial repairs and checks to remove the flaws. No decision tree is provided for this special case.

The decision trees in Figures 1 and 2 are linked to two additional Figures and a brief description is given of these as well as some simple examples of how they are used. 3.1 PROBABILITY OF INCLUSION CURVES

In Figure 3, the PoI curves illustrated are for the 4%, 6%,10%, 20% and 40% degrees of defective weld and are based on the Log-odds model (see Appendix C). The height of each curve represents the probability of including a defective part of the weld, assuming a certain level of NDT I% (i.e. % area selected for inspection). It is predictable that as I% approaches 100% then the PoI approaches unity. It may be necessary to recalculate the PoI curves in Figure 3, depending on the mean and standard deviation of the data. For illustrative purposes, the mean=variance in Figure 3 (c.f. Poisson distribution). In practice it is reasonable to assume that 4% of the weld could be defective. The 4% curve in Figure 3 is recommended for use in estimating the % level of NDT corresponding to a given PoI, or vice versa (i.e. estimating the PoI corresponding to a given % level of NDT). For example, to achieve a high probability of including a defective part, consider a PoI value of 0.9. Figure 3 shows that for a weld that is 4% defective, nearly 50% of the weld would need to be inspected, assuming the weld area was all non-targeted. If all the weld area was targeted, then the equivalent amount of weld to be inspected would be reduced to about 33% (i.e. 50/w), where the weighting factor w = 1.5.

D3

3.2 PROBABILITY OF DETECTION CURVES

A number of 'defect detection trials' have been carried out previously to estimate the probability of detection (PoD) using conventional manual ultrasonic inspection (e.g. National NDT Centre (UK), NORDTEST (DNV, Norway) and NIL (Netherlands)). An AEA Technology report (1) has brought these results together and compared the various PoD curves to compute a lower bound PoD curve for a range of defect heights (1). This lower bound curve is reproduced here in Figure 4. If the NDT method was not ultrasonics, then the appropriate PoD curve for the NDT method of interest could be substituted for Figure 4 (c.f. Figures 8-13 in the main PoD report). For example, if the critical flaw height for a particular weld was 3mm, then the PoD from Figure 4 would be just over 0.5. For a 6mm height the PoD would be just under 0.7. 3.3 INDEX OF DETECTION

For the PoD values in Figure 4, it is assumed that the flaw has been included in the selected part of the weld. If it were not known that the flaw is included then from the multiplication law of probability for independent events (2), the PoD would need to be multiplied by the PoI. In fact Figure 4 is used in conjunction with Figure 3 in the decision trees to compute a parameter defined as the 'Index of Detection' (PID) where

IDP PoI PoD= (1)

which is the probability of including and detecting a flaw (PID) with a certain flaw depth. For example, using the case highlighted in section 3.1 above with PoI = 0.9, then the probability of including and detecting a 3mm high flaw would be about (0.9)(0.5) = 0.45, a 45% chance. For a 6mm high flaw this would be about (0.9)(0.7) = 0.63, a 63% chance. There are two more detailed examples provided in Appendix DI, of how to use the PoI and PoD curves in conjunction with the decision trees for the cases when:

i. No NDT was previously carried out ii. Some NDT was previously carried out

If the final PID value is considered high enough then the NDT carried is regarded as sufficient. If it is not then NDT is necessary. In practice, an acceptable PID value shall be set by a competent person. 4 REFERENCES

1. AEA Technology Report, AEAT-4389 HOIS (98) P8 Issue 2 (DRAFT). Data for PoD curve supplied with kind permission by Dr. Martin Wall, AEA Technology.

2. Crawshaw J and Chambers J: 'A Concise Course In A-Level Statistics With Worked Examples'. Stanley Thornes (Publishers) Ltd, 2nd Edition, 1994, ISBN 0-7487-0455-8.

APPENDIX DI

EXAMPLES FOR THE USING THE DECISION TREES (FIGURES 1 AND 2)

FOR A PARTICULAR NDT METHOD

DI 1

In both Case 1 and Case 2 below the following is assumed: • The D = 4% curve is used

• The critical flaw height is taken to be 6mm.

• The agreed acceptable value for the 'Index of Detection' is PID = 0.6

• The weighting factor w = 1.5

• If at any time during the inspection a significant flaw is detected, then 100% NDT will

apply followed by procedures for remedial repairs and checks to remove any significant

flaws

In following through both cases below it is important to have the relevant decision trees (i.e. Figures 1 and 2), the PoI curves and PoD curve (i.e. Figures 3 and 4). CASE 1: Using the Decision Tree for 'No previous NDT' (Figure 1)

• The PoD value corresponding to 6mm (i.e. Figure 4 or equivalent) is 0.69. • Select, for example, I = 30% which PoI = 0.82. • The corresponding 'Index of Detection' PID = (0.82)(0.69) = 0.57 • PID is unacceptable, since 0.57 < 0.6 • Increase I to 40%, which PoI = 0.88 • The corresponding 'Index of Detection' PID = (0.88)(0.69) = 0.61 • PID is acceptable since 0.61 > 0.6 • Take I = 40% and consider the following possibilities before carrying the inspection • If I = 40% is all non-targeted, IN = 40% • If I = 40% is all targeted, IE 27% (IE = 40/1.5) • If, for example, I = 40% is split IN = 10% and IT = 30%, IE 30% (IE = 10 + (30/1.5)) • If, for example, I = 40% is split IN = 30% and IT = 10%, IE 37% (IE = 30 + (10/1.5)) • Carry out NDT according to IN or IE above and assess for any significant flaws.

DI 2

CASE 2: Using the Decision Tree for 'Previous manual ultrasonic NDT=I%' (Figure 14)

Note that the PoD value corresponding to 6mm is 0.69 in all the different examples considered below. • Assume I = 30% and assess the split of non-targeted and targeted inspection (IN and IT) • If I = 30% was all non-targeted then IN = 30% and PoI = 0.82

• The corresponding 'Index of Detection' PID = (0.82)(0.69) = 0.57 • PID is unacceptable, since 0.57 < 0.6 • Assess the additional amount of manual ultrasonic NDT to ensure PID is acceptable. For

example, if an additional 10% was considered and was all non-targeted, then IN =40% and PoI = 0.88. The new PID = (0.88)(0.69) = 0.61, which is acceptable. If on the other hand the additional 10% was all targeted then it is equivalent to 10 x 1.5 = 15%, that is, a total IE = 45% and PoI = 0.9. The new PID = (0.9)(0.69) = 0.62, which is acceptable. If this additional 10% was split as IN + IT, then 0.61 PID 0.62 and an even split of IN = 5% and IT = 5% would suffice.

• Carry out the additional NDT and assess for any significant flaws. • If I = 30% was all targeted, then IE = (30)(1.5) = 45% and PoI = 0.9

• The corresponding 'Index of Detection' PID = (0.9)(0.69) = 0.62 • PID is acceptable, since 0.62 > 0.6 • Stop, since no significant flaws have been detected along this branch of the tree.

• If, for example, I = 30% was split IN = 10% and IT = 20%, then IE = 10+(20)(1.5) = 40 %

and PoI = 0.88

• The corresponding 'Index of Detection' PID = (0.88)(0.69) = 0.61 • PID is acceptable, since 0.61 > 0.6 • Stop, since no significant flaws have been detected along this branch of the tree.

• If, for example, I = 30% was split IN = 20% and IT = 10%, then IE = 20+(10)(1.5) = 35%

and PoI = 0.85.

• The corresponding 'Index of Detection' PID = (0.85)(0.69) = 0.59 • PID is unacceptable, since 0.59 < 0.6 • Assess the additional amount of NDT to ensure PID is acceptable in the manner demonstrated

in the first example above (e.g. in this case an additional IN = 5%, or IT = 4% would suffice). • Carry out the additional NDT and assess for any significant flaws.

Appendix DI

Figure 1 No previous NDT: a decision tree to estimate a sufficient % level of inspection

Key:

I Actual % level of NDT IE Equivalent % level of NDT IN The non-targeted % level of NDT IT The targeted % level of NDT w The weighting factor (set at 1.5) PoI Probability of Inclusion PoD Probability of Detection PID Index of Detection

Start

Obtain the PoD for the critical flaw size

from Figure 4

Compute the 'Index of Detection'

PID = PoI x PoD

Is PID acceptable?

Carry out targeted NDT equivalent to I%

IE = (I / w) %

Carry out NDT equivalent to I=IN+IT

IE = (IN + (IT / w)) %

Are areas to be inspected all non-

targeted?

Are areas to be inspected all

targeted?

Carry out non-targeted NDT

IN = I%

Y

Y

N

N

Stop

Carry out 100% NDT, followed by procedures for remedial repairs and checks, to remove flaws

Detected any significant flaws?

Y

N

Select a level for NDT (I %) and obtain the

corresponding PoI value from Figure 3

Increase the I % level for NDT and obtain the new corresponding PoI value

from Figure 3

Y

N

Appendix DI

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100

% Level of Inspection ( I%)

Pro

ba

bil

ity

of

Inclu

sio

n (

Lo

g-o

dd

s m

od

el)

4% Defective Weld

6% Defective Weld

10% Defective Weld

20% Defective Weld

40% Defective Weld

Log-odds Distribution with

variance = mean

Figure 3 The Probability of including a defective part of the weld given a certain % level of inspection, using the Log-odds model

Appendix DI

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 1 2 3 4 5 6 7 8 9 10Flaw Depth (mm)

Pro

ba

bil

ity

of

Dete

cti

on

(POD

)

Based on manual

ultrasonic NDT

Figure 4 The lower bound probability of detection vs. flaw depth (reproduced with kind permission from reference (1)

Appendix DI

Figure 2 Previous NDT = I%: a decision tree to assess whether I% is sufficient

Key:

I Actual % level of NDT IE Equivalent % level of NDT IN The non-targeted % level of NDT IT The targeted % level of NDT w The weighting factor (set at 1.5) PoI Probability of Inclusion PoD Probability of Detection PID Index of Detection

Start

Carry out 100% NDT, followed by procedures for

remedial repairs and checks to remove flaws

Obtain the PoI value from Figure 3 corresponding to

the 'equivalent level' of NDT

IE = (I x w) %

Obtain the PoI value from Figure 3 corresponding to

the 'equivalent level' of NDT to IE=IN+IT

IE = (IN + (IT x w)) %

Are areas inspected all non-targeted?

Are areas inspected all targeted?


Obtain the PoI value from Figure 3 corresponding to the non-targeted level of

NDT IN = I%

Find the PoD for the critical flaw size from

Figure 4 and compute the 'Index of Detection'

PID = PoI x PoD

Is PID acceptable? Stop

Assess the additional NDT required using Figure 3 and the above formulae, where necessary, to ensure PID is

acceptable. Carry out additional inspection.

Y

Y

Y

N

N

N

Y N

Carry out 100% NDT, followed by procedures for

remedial repairs and checks to remove flaws


Y

N

Published by the Health and Safety Executive06/06

RR 454

Documents

RESEARCH REPORT 454Model for Hit/Miss Data 18 6.4.2. Model for Signal Response Data 19 6.4.3. To Compute PoD parameters 20 6.4.4. To Achieve the Desired PoD/Confidence Limit Combination