POOLED DATA DISTRIBUTIONS GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING COMPARISON REFERENCE VALUES Alan Steele, Ken Hill, and Rob Douglas National Research

POOLED DATA DISTRIBUTIONS

GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING COMPARISON

REFERENCE VALUES

Alan Steele, Ken Hill, and Rob Douglas

National Research Council of CanadaE-mail: [email protected]

National Research Conseil nationalCouncil Canada de recherches

Measurement comparison data sets are generally summarized using a simple statistical reference value calculated from the pool of the participants’ results. Consideration of the comparison data sets, particularly with regard to the consequences and implications of such data pooling, can allow informed decisions regarding the appropriateness of choosing a simple statistical reference value. Graphs of the relevant distributions provide insight to this problem.

Steele, Hill, and Douglas: Pooled Data Distributions 2

Introduction• Comparison data collection and analysis continues to grow in

importance among the tasks of international metrology

• Sample distributions and populations are routinely considered when preparing the summary of the comparison

• Reference values (KCRVs) are often calculated from the measurement data supplied by the participants

• We believe that graphical techniques are an aid to understanding and communication in this field


The Normal Approach• Generally, initial implicit assumption is to consider that all of the

participants’ data, as xi/ui, represent individual samples from a single (normal) population

• A coherent picture of the population mean and standard deviation can be built from the comparison data set that is fully consistent with the reported values and uncertainties

• Most outlier-test protocols rely on this assumption to identify when and if a given laboratory result should be excluded, since its inclusion would violate this internal consistency


Pooled Data Distributions• Creating pooled data distributions tackles this problem from the

opposite direction

• The independent distributions reported by each participant (through their value and uncertainty) are summed directly

• Result is taken as representative of the underlying population as revealed in the comparison measurements

• Monte Carlo methods are useful when calculations involve Student distributions or medians rather than means


Monte Carlo Calculations

• High quality linear congruent uniform random number generators are easy to find

• Transformation from uniform to any distribution done via cumulative distribution

• Example shows Student distribution transform

• Our Excel Toolkit includes an external DLL for doing fast Monte Carlo simulations with multiple large arrays

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

-10 -5 0 5 10

x

Stu

dent

PD

F

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Stu

dent

CD

F

= -1 = 2 = 4

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

-10 -5 0 5 10

x

Stu

dent

His

togr

am (

105 E

vent

s)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Stu

dent

CD

F

= -1 = 2 = 4


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

-5 -4 -3 -2 -1 0 1 2 3 4 5

x

CD

F(x

,=

0,

=1

,)

2

3

4

5

6

7

8

9

10

Dealing with Student Distributions

• Student Cumulative Distribution Functions for different Degrees of Freedom (= 2…10)

• Note that the line at 97.5% cumulative probability crosses each curve at the coverage factor, k, appropriate for a 95% confidence interval

95%

96%

97%

98%

99%

100%

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

x

CD

F(x

,=

0,

=1

,)

2

3

4

5

6

7

8

9

10

k


Example Data From KCDB

• Recent results for CCAUV.U-K1

• Low power, 1.9 MHz: 5 Labs

• Finite degrees of freedom specified for all participants

• Data failed consistency check using weighted mean

• Median chosen as KCRV

Lab P Ref (mW) u (mW)

PTB 97.4 0.84 8.3

NIST 99 0.64 6.3

NPL 97.6 1.01 11

CSIRO 114.5 6.75 6.7NIM 94 1.16 12.5

PTB NIST NPL CSIRO NIM90

95

100

105

110

115

120

125

130

135

140

PR

ef (

mW

)


Statistical Distributions

• Results of Monte Carlo simulation:– lab distributions used to

resample comparison– pooled data histogram

incremented once for each lab per event

– mean, weighted mean, and median calculated for each event

• Population revealed by measurement is multi-modal and evidently not normal

NIM

CSIRO

NPL

NIST

PTB

75 85 95 105 115 125

PRef (mW)

75 85 95 105 115 125PRef (mW)

PD

F

Pooled Data


Statistical Distributions

• Results of Monte Carlo simulation:– lab distributions used to

resample comparison– pooled data histogram

incremented once for each lab per event

– mean, weighted mean, and median calculated for each event

• Population revealed by measurement is multi-modal and evidently not normal

NIM

CSIRO

NPL

NIST

PTB

75 85 95 105 115 125

PRef (mW)

75 85 95 105 115 125PRef (mW)

PD

F

Pooled Data

Weighted Mean

Median

Simple Mean


Advantages of Monte Carlo

• Technique is simple to implement

• Allows calculation of confidence intervals for statistics

• Covariances can be accommodated in straightforward manner

• Possible to include outlier rejection schemes

• Easy to track quantities of interest, such as probability of a given participant being median laboratory

• Can consider other candidate reference values


Example: CCT-K3 Argon Point

• Another example from KCDB

• CCT-K3 Argon Triple Point

• Large variation in reported values

• Large variation in stated uncertainties

• No KCRV was assigned, based on data pooling analysis

-4

-3

-2

-1

0

1

2

Laboratory

TL

ab -

TP

ilot (

mK

)


Algorithmic Reference Values

• Linear combinations of simple estimators can be used as robust estimators of location

• For CCT-K3, proposal to use simple average of mean, weighted mean, and median

• Evaluation of any such algorithmic estimator is easy to do with Monte Carlo

-0.50 -0.25 0.00 0.25 0.50

TLAB - TARV (mK)

PD

F ARV

Weighted Mean

MedianSimple Mean


Quantifying the Comparison

• Calculating a reference value – typically the variance-weighted mean or the median - is a routine part of reporting comparisons

• The suitability of these statistics for representing the data set can be checked using chi-squared testing

• It is also possible to perform such tests without invoking a reference value by considering the data in pair wise fashion

• Advantages of pair-statistics– Always works, even before choosing a reference value– More rigorous, since can handle correlations exactly– Explicit, following metrological chains of inference


Pair-Difference Distributions

• Similar to exclusive statistics

• Consider difference between one lab and “rest of world”

• Sum of per-lab differences is the all-pairs-difference (APD) distribution; this is symmetric

• Width of APD is a measure of “global” quality assurance for independent calibration of an artifact by two different labs chosen at random

-40 -30 -20 -10 0 10 20 30 40

Measurement Difference (mW)

Xj - PTB

Xj - NIST

Xj - NPL

Xj - CSIRO

Xj - NIM

-40 -30 -20 -10 0 10 20 30 40

Measurement Difference (mW)

Sum: APD

Median ± MAD


N

ji1,i jiij2j

2i

2ji

12j )2()()1(χ uuruuxxN

N

1j

2j

12r χχ N

Reduced Chi-Squared Testing• Normalizing the pair differences by the pair uncertainties

allows us to build tests of the measurement capability claims

• This is still independent of any chosen reference value

• This All Pairs Difference reduced 2 has N-1 degrees of freedom

• If a data set fails the APD 2 test, it will fail for every possible KCRV

PTB NIST NPL CSIRO NIM APD 2 3.57 5.78 3.25 6.65 8.57 5.57

Pr(2 > 2obs) <5.810-2 <1.610-2 <7.110-2 <9.910-3 <3.410-3 =1.810-4

APD


Conclusions

• Monte Carlo technique is fast and simple to implement

• Graphs provide a powerful tool for visual consideration of:– Pooled data (sum distribution)– Simple Estimators (mean, weighted mean, median)– Other Estimators (any algorithm can be used)

• All-pairs reduced chi-squared statistic is egalitarian over participants, and independent of choice of KCRV

• No single choice of KCRV can adequately represent a comparison that fails the all-pairs-difference chi-squared test

Documents

POOLED DATA DISTRIBUTIONS GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING COMPARISON REFERENCE VALUES Alan Steele, Ken Hill, and Rob Douglas National Research