Upload
docong
View
240
Download
2
Embed Size (px)
Citation preview
S-1
Analytical and Bioanalytical Chemistry
Electronic Supplementary Material
An international assessment of the
metrological equivalence of higher-order measurement services for
creatinine in serum
Johanna E. Camara , Katrice A. Lippa, David L. Duewer, Hugo Gasca-Aragon, Blaza Toman
S-2
Repeatability Measurements. Table S1 lists the repeatability measurements for the CCQM-K80
study. All values are formally expressed in arbitrary units.
Variance Components and Uncertainty Estimation. The sources of measurement variability
(instrumental, sample preparation and between-campaign) can be described using the
experimental design in Figure 1 as:
2
r,~ ,iijkijiijkl NR [S1]
where i indexes materials, j indexes units, k indexes independent aliquots per unit, l indexes
independent replicates per aliquot, “~ N” indicates “is distributed as Normal (i.e., Gaussian) with
the specified mean and variance,” μi respresents the unknowable true value of the measurand in
the material (i.e., creatinine in serum), γij are between-unit differences, δijk are between-aliquot
differences, and ζr,i is the true instrumental repeatability and is assumed to be the same across all
units. The γij are assumed to be
2c,0~ ,iij N [S2]
where ζc,i reflects the true between-campaign variability for the material. The δijk are assumed to
be
2
a,0~ ,iiijk N [S3]
where ζa,i reflects the true within-unit (between-aliquot) material variability and is assumed to be
the same for all units of a given material. While potentially over-simplified, this model with
these assumptions is likely fit-for-purpose given that HOVAMs are designed to be stable and
S-3
homogenous. Use of more complex models, e.g., allowing ζr,i or ζa,i to vary between units,
generally will require more data in order to provide reliable estimates.
For completely balanced measurement designs, an appropriate estimate for μi is the grand mean,
Ri, of all the measurements for each material. The usual anova-type estimate of the standard
uncertainty of Ri is:
rac
2r,
2a,r
2cra
nnn
nnnRu
ii,i
i
[S4]
where nc is the number of campaigns, na is the number of aliquots taken from the unit used in
each campaign, and nr is the number of replicates of each aliquot.
In CCQM-K80, nc and nr are always 2 and na is either 2 or 3. Given that these serum-based
materials may not be of uniform composition throughout a given unit, analyzing duplicate
aliquots (or triplicate in the case of LGC materials) within each unit as well as evaluating at least
two separate units can help ensure a representative assessment of each material. Hence, this
necessitates the use of the three-level nested measurement design as depicted in Figure 1 in main
text.
Estimates for the variance components, symbolized ic, , ia, , and ir, , can be obtained using
linear mixed model analysis systems such as the SAS MIXED [1] and the R “lmer” [2]
procedures. The variance component and standard uncertainty estimates are listed in Table S2.
All non-zero estimates of relative ,ir were pooled and resulted in a value of 0.99 %, which
estimates the instrumental sources of variance. There are no significant differences in the
S-4
estimates based on duplicate analyses of two aliquots per unit versus those on duplicates of three
aliquots. Since many of the ,ia are estimated as zero, the 0.68 % pooled relative standard
deviation provides only a worst-case bound on the aliquot preparation variance. Since many of
the ,ic are likewise estimated as zero, the 0.77 % pooled relative standard deviation also
provides only a worst-case bound on the sample preparation variance sources.
Estimating 95 % level of confidence coverage intervals U95(Ri) for each Ri from a small number
of measurements can be much more complicated than multiplying the standard uncertainties
u(Ri) obtained from equation [S4] by a coverage factor of 2. The approach used herein is to first
expand the standard uncertainties into U95(Ri) form and then revert to “large sample standard
uncertainties,” by dividing by a factor of 2. We refer to this quantity as u∞(Ri):
u∞(Ri) = U95(Ri)/2 [S5]
The u∞(Ri) estimates provide the same statistical interpretation as the U95(Vi)/2 in equation 3 and
thus provide a consistent description for both the assigned values, V, and the measurements
responses, R.
Two methods were used to estimate the U95(Ri): classical long-term frequency (“frequentist”)
expansion and constrained empirical Bayesian analysis. The frequentist method recommended
in the GUM [3] expands the standard uncertainty u(Ri) using an appropriate Student’s t coverage
factor to generate an expanded uncertainty:
ivi RutRUi %,95f95 [S6]
where vi are the degrees of freedom. Depending on the relative magnitude of the estimated
S-5
variance components r, a, and c, the vi for the CCQM-K80 materials range from 7 to 1. This
generates two-tailed ivt %,95 that range from 2.4 to 12.7, the latter of which translates into
unrealistically large U95(Ri)f values. The vi and U95(Ri)f values are listed in Table S2.
Bayesian analysis is based on a somewhat different definition of probability than the frequentist
interpretation that underpins classical statistical inference. Under the Bayesian paradigm,
parameters such as the measurand value and variance components have probability distributions
that quantify our knowledge about them. The estimation process starts with quantification of
prior knowledge about the parameters followed by specification of the statistical model that
relates the parameters to the data. The statistical model is called a likelihood function and is the
same as described in Equation S1. The components of these models are combined via Bayes
Theorem to obtain posterior distributions for the parameters. These distributions update our
knowledge about the parameters based on the evidence provided by the data. This analysis can
produce a probability distribution for each of the i (the true value of measurand quantity
estimated by the measurement mean, Ri) which encompasses all of the information and
variability present in the data but is confined by bounds based on prior knowledge. The process
yields a probability interval which is interpretable as an uncertainty interval as defined in JCGM
101:2008 “Evaluation of measurement data — Supplement 1 to the GUM — Propagation of
distributions using a Monte Carlo method (GUM Supplement 1)” [4].
While the probability distributions for the μi may not be available in closed form, with these
priors and using empirical Bayesian methods, it is possible to estimate the desired coverage
intervals, U95(Ri)B. Software systems suitable for computing the intervals, such as WinBUGS
S-6
[5], are freely available and (relatively) easy-to-use. The U95(Ri)B are listed in Table S2. The
U95(Ri)B are smaller than the U95(Ri)f for materials associated with one degree of freedom and
tend to be somewhat larger than the U95(Ri)f for materials associated with more than one degree
of freedom.
As discussed in the main text, “Leave-one-out” (LOO) cross-validation was used if the GDR
function was strongly influenced by materials having relatively small U95(Vi) and/or U95(Ri) or
very low or very high {Vi, Ri}. LOO is an established and routine tool for evaluating the
predictive utility of a model [6] and is an efficient, if empirical approach to establishing which, if
any, materials are sufficiently influential to distort the consensus estimation of the GDR
function. Figure S1 compares the “exact” uncertainty-scaled distances, i, calculated using all
materials with the LOO-estimated i for each material. Circles with crosses that are not
substantially on the diagonal line indicate materials that strongly influence the GDR. Circles
outside this square are potentially anomalous. Only materials B-1 and E-1 appear either
anomalous or strongly influential using the u∞(Ri)f. None of the materials appear particularly
problematic when the u∞(Ri)B are used. This reflects the somewhat more realistic estimates of
repeatability measurement provided by the Bayesian procedure.
After careful consideration, it was decided by NIST to exclude material “E-1” from the
estimation of the GDR parameters on the grounds that 1) U95(Vi) is suspiciously small, 2) the
process used to establish the U95(Vi) does not reflect their current practice, and 3) this CRM is
completely sold out and no longer available for re-evaluation, let alone sale. After E-1’s
exclusion from the GDR model, none of the remaining 16 materials were anomalous or
influential with either set of repeatability measurement uncertainties.
S-7
Degrees of Equivalence. The degree of equivalence for a given material expressed as a relative
percent, %di, can be estimated from the signed orthogonal distance of the certified and measured
values (Vi, Ri) relative to the estimates provided from the GDR analysis (V , R , , ):
2ˆˆ
ˆˆˆˆSIGN100%
22
ii
iii
iiRV
RRVVVVd [S7]
The function SIGN returns the sign (±1) of its argument and defines whether the observed {V,
Ri} pair is “above” or “below” the GDR function. The measurement-related terms are
transformed to have the same scale as the assigned values.
Given the covariances among the and parameters of the GDR function and the
iiii RuRVuV , used in their estimation, the expanded uncertainties for the %di,
U95(%di), were estimated via a parametric bootstrap Monte Carlo (PBMC) approach [7]. For a
sufficiently large number of samples from the bootstrap analysis and assuming an approximately
symmetric distribution, the percentiles of the empirical distribution of %di provide the desired
estimates:
2/025.0,%Percentile975.0,%Percentile%95 ijiji dddU . [S8]
If a distribution is significantly asymmetric, a symmetric ±U95(%di) interval can be estimated
from the largest half-interval using the same empirical percentiles. The PBMC distributions for
all of the CCQM-K80 materials were approximately symmetrical.
S-8
If the joint distribution of the PBMC %di estimates for all of the materials submitted by a given
participating NMI is approximately symmetric, the relative degrees of equivalence for the
participating NMIs, %D, can be estimated as the mean of the %di for each material and the
U95(%D) can be estimated from the standard deviation of these %di and the pooled U95(%di). If
the distribution is significantly asymmetric, the ±U95(%D) interval can be estimated empirically
from the joint distribution. The direct and empirical U95(%D) estimates for all of the CCQM-
K80 participants were essentially identical.
S-9
Table S1. Summary of measured creatinine responses (in arbitrary units) for each human serum material investigated.
Campaign1 (Unit1) Campaign2 (Unit2) Material Aliquot1 Aliquot2 Aliquot3
a Aliquot1 Aliquot2 Aliquot3 a
NMI Label Code Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 CENAM DMR 263a A-1 7.105 6.983 7.067 7.042 7.132 7.096 7.172 7.422 KRISS 111-01-01A B-1 5.964 5.848 5.858 5.828 5.789 5.928 5.940 5.830 KRISS 111-01-03A B-2 7.136 7.048 7.078 7.103 7.020 7.024 7.010 7.013 KRISS 111-01-04A B-3 24.825 25.083 25.220 25.278 25.163 25.387 25.406 25.171 KRISS 111-01-02A B-4 27.370 27.436 27.170 27.467 27.508 27.429 27.431 27.651 LGC ERM-DA252a C-1 2.986 3.098 3.146 3.128 3.121 3.125 3.083 3.124 LGC ERM-DA251a C-2 22.074 21.933 22.580 22.091 21.989 22.293 21.646 21.811 22.009 21.847 21.600 22.186 LGC ERM-DA250a C-3 39.911 39.023 40.485 39.723 40.168 39.628 39.400 39.251 40.362 39.826 40.759 41.267 LGC ERM-DA253a C-4 50.868 50.225 49.429 49.346 50.042 49.163 51.468 50.045 50.621 50.521 50.618 50.478 NIM Creatinine-1 D-1 7.959 8.098 8.143 7.996 8.001 8.152 8.104 8.157 NIM Creatinine-2 D-2 34.045 34.562 34.031 33.249 34.421 34.320 33.852 34.512 NIST SRM 909b I b E-1 7.075 7.156 7.139 7.184 7.065 7.166 7.236 7.152 NIST SRM 967a I E-2 8.347 8.116 8.270 8.161 8.265 8.218 8.293 8.294 NIST SRM 909b II b E-3 34.158 34.256 34.065 34.283 34.335 33.759 33.712 33.927 NIST SRM 967a II E-4 37.451 38.416 37.720 38.268 38.054 37.386 37.934 38.484 PTB RELA 1/05 KS-A F-1 45.180 46.153 45.425 44.829 45.792 45.188 45.009 45.174 PTB RELA 1/05 KS-B F-2 58.431 56.935 57.079 57.317 58.195 58.066 57.437 57.466
a Results for Aliquot3 were obtained using the “routine” sample preparation protocol used for all other materials; the Aliquot1 and Aliquot2 results for these three materials are for aliquots prepared using the certificate-specified minimum sample volume.
b Results adjusted for the measured fill-weights of each unit.
S-10
Table S2. Variance components, means and standard uncertainties together with the 95 % level of confidence intervals from both a frequentist-type expansion and a Bayesian estimation
Materials Numbers a Variance Components b Means and Uncertainties c
NMI Label Code Nt Nr Na Nc r a c R u(R) v U95(R)f U95(R)B
CENAM DMR 263a A-1 8 2 2 2 0.100 0.059 0.089 7.127 0.078 1 0.991 0.280 KRISS 111-01-01A B-1 8 2 2 2 0.063 0 0 5.873 0.022 7 0.053 0.095 KRISS 111-01-03A B-2 8 2 2 2 0.027 0 0.051 7.054 0.037 1 0.474 0.111 KRISS 111-01-04A B-3 8 2 2 2 0.148 0.104 0.073 25.192 0.090 1 1.144 0.385 KRISS 111-01-02A B-4 8 2 2 2 0.120 0 0.082 27.433 0.072 1 0.913 0.365 LGC ERM-DA252a C-1 8 2 2 2 0.043 0.029 0 3.101 0.021 3 0.067 0.137 LGC ERM-DA251a C-2 d 8 2 2 2 0.198 0.134 0.199 21.999 0.171 1 2.171 0.990 LGC ERM-DA251a C-2 12 2 3 2 0.230 0 0.198 22.005 0.155 1 1.969 0.805 LGC ERM-DA250a C-3 d 8 2 2 2 0.458 0.251 0 39.748 0.205 3 0.652 1.360 LGC ERM-DA250a C-3 12 2 3 2 0.431 0.516 0 39.983 0.245 5 0.629 1.130 LGC ERM-DA253a C-4 d 8 2 2 2 0.554 0.437 0.266 50.315 0.348 1 4.429 1.635 LGC ERM-DA253a C-4 12 2 3 2 0.520 0.248 0.488 50.235 0.390 1 4.951 1.045 NIM Creatinine-1 D-1 8 2 2 2 0.079 0 0 8.076 0.028 7 0.066 0.120 NIM Creatinine-2 D-2 8 2 2 2 0.407 0.166 0 34.124 0.166 3 0.529 0.900 NIST SRM 909b I E-1 8 2 2 2 0.056 0 0 7.147 0.020 7 0.047 0.098 NIST SRM 967a I E-2 8 2 2 2 0.076 0 0 8.245 0.027 7 0.064 0.132 NIST SRM 909b II E-3 8 2 2 2 0.212 0 0.148 34.062 0.129 1 1.636 0.505 NIST SRM 967a II E-4 8 2 2 2 0.420 0 0 37.964 0.148 7 0.351 0.675 PTB RELA 1/05 KS-A F-1 8 2 2 2 0.434 0 0 45.344 0.154 7 0.363 0.895 PTB RELA 1/05 KS-B F-2 8 2 2 2 0.538 0.112 0 57.616 0.198 3 0.631 1.025
a Nr is the number of replicates per aliquot, Na is the number of aliquots per campaign, and Nc is the number of campaigns. Nt is the total number of
measurements, which is equal to Nr × Na × Nc. b , , and are the estimated between-replicate, between-aliquot, and between-campaign components of variance expressed as standard deviations. c R is the mean of the measured creatinine responses, u(R) the standard uncertainty for R, v the number of degrees of freedom associated with u(R), U95(R)f, a
frequentist 95 % confidence estimate for R, and U95(R)B a Bayesian 95 % confidence estimate for R. d Estimated using only the Aliquot1 and Aliquot2 measurements.
r a c
S-11
Fig. S1. “Leave One Out” (LOO) Analysis in the Identification of Potentially Anomalous,
Strongly Influential Materials.
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
"Exact" Uncertainty-Weighted Distance, ε i
LO
O U
nce
rtai
nty
-Wei
gh
ted
Dis
tan
ce, ε i
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
"Exact" Uncertainty-Weighted Distance, ε i
LO
O U
nce
rtai
nty
-Wei
ghte
d D
ista
nce
, ε i
E-1
B-1
E-1
B-1
The panel to the left presents results using the frequentist u∞(Ri)f, the panel to the right presents results using the Bayesian u∞(Ri)B. Each open circle represents estimates for a particular material; the cross represents the estimated 95 % level of confidence intervals on the estimates. The red lines bound the distances expected for materials compatible with the GDR function with a 95 % level of confidence.
S-12
References
1. SAS/STAT 9.2 User’s Guide (2008) SAS Institute Inc. Cary, NC USA.
2. Bates D, Maechler M (2009) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-32 http://cran.us.r-project.org/web/packages/lme4/
3. JCGM 100:2008 (2008) Guide to the Expression of Uncertainty in Measurement (GUM). BIPM, Sèvres, France. http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.
4. JCGM 101:2008 (2008) Evaluation of measurement data — Supplement 1 to the “Guide to the expression of uncertainty in measurement” — Propagation of distributions using a Monte Carlo method. BIPM, Sèvres, France. http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf.
5. Lunn, DJ, Thomas, A, Best, N, Spiegelhalter, D (2000) WinBUGS -- a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10:325-337
6. Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575-583
7. Duewer DL, Kowalski BR, Fasching JL. Improving reliability of factor-analysis of chemical data by utilizing measured analytical uncertainty (1976) Anal Chem 48:2002-10