Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Received: 13 November 2017 Revised: 15 December 2017 Accepted: 18 December 2017
DOI: 10.1002/rcm.8052
R E S E A R CH AR T I C L E
Seeking excellence: An evaluation of 235 internationallaboratories conducting water isotope analyses by isotope‐ratioand laser‐absorption spectrometry
L.I. Wassenaar1 | S. Terzer‐Wassmuth1 | C. Douence1 | L. Araguas‐Araguas1 |
P.K. Aggarwal1 | T.B. Coplen2
1 International Atomic Energy Agency, Isotope
Hydrology Section, PO Box 100, A‐1400Vienna, Austria
2US Geological Survey, 431 National Center,
12201Sunrise Valley Drive, Reston, VA 20192,
USA
Correspondence
L.I. Wassenaar, International Atomic Energy
Agency, Isotope Hydrology Section, PO Box
100, A‐1400 Vienna, Austria.
Email: [email protected]
Rapid Commun Mass Spectrom. 2018;32:393–406.
Rationale: Water stable isotope ratios (δ2H and δ18O values) are widely used tracers in
environmental studies; hence, accurate and precise assays are required for providing sound
scientific information. We tested the analytical performance of 235 international laboratories
conducting water isotope analyses using dual‐inlet and continuous‐flow isotope ratio mass
spectrometers and laser spectrometers through a water isotope inter‐comparison test.
Methods: Eight test water samples were distributed by the IAEA to international stable
isotope laboratories. These consisted of a core set of five samples spanning the common
δ‐range of natural waters, and three optional samples (highly depleted, enriched, and saline).
The fifth core sample contained unrevealed trace methanol to assess analyst vigilance to the
impact of organic contamination on water isotopic measurements made by all instrument
technologies.
Results: For the core and optional samples ~73 % of laboratories gave acceptable results
within 0.2 ‰ and 1.5 ‰ of the reference values for δ18O and δ2H, respectively; ~27 %
produced unacceptable results. Top performance for δ18O values was dominated by dual‐inlet
IRMS laboratories; top performance for δ2H values was led by laser spectrometer laboratories.
Continuous‐flow instruments yielded comparatively intermediate results. Trace methanol
contamination of water resulted in extreme outlier δ‐values for laser instruments, but also
affected reactor‐based continuous‐flow IRMS systems; however, dual‐inlet IRMS δ‐values
were unaffected.
Conclusions: Analysis of the laboratory results and their metadata suggested inaccurate or
imprecise performance stemmed mainly from skill‐ and knowledge‐based errors including:
calculation mistakes, inappropriate or compromised laboratory calibration standards, poorly
performing instrumentation, lack of vigilance to contamination, or inattention to unreasonable
isotopic outcomes. To counteract common errors, we recommend that laboratories include
1–2 'known' control standards in all autoruns; laser laboratories should screen each autorun
for spectral contamination; and all laboratories should evaluate whether derived d‐excess
values are realistic when both isotope ratios are measured. Combined, these data evaluation
strategies should immediately inform the laboratory about fundamental mistakes or
compromised samples.
1 | INTRODUCTION
The stable‐hydrogen and ‐oxygen isotope (δ2H and δ18O values)
composition of environmental waters is an important assay in
diverse hydrogeologic, meteorological, watershed, oceanographic and
ecological studies around the world.1-3 Accurate and precise isotopic
wileyonlinelibrary.com
measurements of environmental waters are imperative in providing
rigorously sound scientific interpretations and hydrological assessments.
The instrumentation used for measuring the H and O stable isotopic
compositions of water has evolved over six decades, and is classified
into three types: (i) dual‐inlet isotope‐ratio mass spectrometry
(DI‐IRMS; 1950s‐present), (ii) continuous‐flow isotope‐ratio mass
Copyright © 2018 John Wiley & Sons, Ltd./journal/rcm 393
394 WASSENAAR ET AL.
spectrometry (CF‐IRMS; 1990s‐present), and (ii) laser‐absorption
spectrometry (LAS; 2010s‐present). Increasingly, scientists are using
laser spectrometers due to lower capital cost and consumable
demands, ease of use, and ongoing improvements in analytical
precision.4
Isotope‐ratio mass spectrometers (DI‐ or CF‐IRMS instruments)
do not measure liquid water H and O isotopologues directly, nor
simultaneously; these assays are typically conducted for single
isotope ratios separately (2H/1H or 18O/16O) utilizing on‐ or off‐line
preparation devices to convert sample H2O into a pure analyte gas
(H2, CO2, CO) by gas–liquid equilibration 'activity' methods such as
CO2/H2–H2O equilibration (δ2H and δ18O values), thermochemical
decompositional methods such as zinc, chromium, manganese, and
uranium reduction (δ2H values), glassy carbon (δ2H and δ18O values)
or guanidine hydrochloride (δ18O values) (reviewed in 5,6). The sample
H2 or CO2/CO gas is introduced into the isotope‐ratio mass
spectrometer as a cryogenically purified sample gas or as a gas mix
entrained in helium carrier that is purified and separated by capillary
or packed column gas chromatography and the isotope ratios are
determined by CF‐IRMS. The longstanding CO2– and H2–H2O
equilibration method by DI‐IRMS is generally considered the
"gold standard" method for reliable high‐precision water isotopic
measurements, but is the costliest analytical option.
Unlike the IRMS systems, new lower cost technology laser
spectrometers measure all the key isotopologue species (1H216O,
1H2H16O, 1H218O, 1H2
17O) concentrations directly on injections of
(vaporized) H2O sample gas using infrared laser absorption
spectrometry, conducted either in static mode (Off‐Axis Integrated
Cavity Output Spectroscopy) or in flowing mode (Cavity Ring Down
Spectroscopy).4,7-9 Laser spectrometers produce accurate and precise
results provided that the water samples do not contain volatile organic
compounds (VOCs), hydrocarbons, or chemicals that can cause
detrimental spectral interferences.10-12 Furthermore, because water
vapor as an analyte gas is 'sticky', LAS instruments exhibit strong
between‐sample memory effects that must be corrected for.13
Regardless of which instrumentation technology laboratories
choose for water stable isotopic determinations, accurate and precise
measurements require a well‐maintained isotope‐ratio mass or laser
spectrometer and sample preparation peripheral(s), in addition to an
appropriate range of laboratory standards carefully calibrated to the
VSMOW‐SLAP primary reference scale.13,14
This IAEA Water Isotope Inter‐Comparison (WICO2016)15 tested
the performance of 235 international laboratories routinely conducting
water stable‐isotope measurements. Participating laboratories used a
diverse range of dual‐inlet and continuous‐flow IRMS instruments,
and LAS spectrometers and a wide range of sample preparation
peripherals and operational practices. A core set of five isotopically
contrasting water samples was distributed to international
laboratories,14 with three optional samples to accommodate those
laboratories working in cold regions, saline systems, or low latitude
or evaporative systems. Here we synthesize the results of the
WICO2016 inter‐comparison, with the aim of providing a review
of current analytical practices, identifying key problem areas, and
suggesting potential improvements to better enable laboratories to
improve their analytical accuracy and precision outcomes.
2 | EXPERIMENTAL
2.1 | Determining testing needs
Before WICO2016 implementation, a survey was conducted in
October 2015 asking laboratories what kind of water samples
would be appropriate for a contemporary water stable isotope inter‐
comparison. Of 219 laboratories from 63 countries that responded,
94 % recommended a core set (δ18O values from 0 to −20 ‰), 41 %
recommended an 18O‐enriched water (δ18O >3–5 ‰), 38 %
recommended a highly 18O‐depleted water (δ18O < −30 ‰), and 11 %
suggested sea or mineralized water with >30 g/L total dissolved solids.
Ten laboratories recommended specialty water samples such as atom
% enrichments (e.g. doubly labelled water), plant or biological water
extracts, alcohol distillates, or waters containing pollutants. Four
laboratories suggested samples for ultra‐high‐precision 17O‐excess
determination. These responses informed our sample mix that
accommodated a majority of the expressed interests. Online
registration for WICO2016 was conducted from December 2015 to
February 2016. A total of 266 laboratories from 69 countries
registered for participation in the testing, with the understanding that
individual laboratory evaluations are kept confidential, but that results
could be used anonymously in this synthesis report.
2.2 | Test samples: Origin and storage
Eight testwater sampleswere prepared forWICO2016. All were sourced
from natural waters, but several were adjusted using aliquots of 99.9 %
D2O and 98 % H218O to achieve specific δ‐value targets, while
concurrently maintaining 'usual' ranges of deuterium‐excess (d‐excess)
values ranging from 0 to +10 ‰; the δ17O value was not considered
for this test. The WICO2016 test waters, their sources, and stable
isotopic data are tabulated inTable 1. The test samples were divided into
a core set of five samples, plus three optional samples. The core set
spanned the most common range of δ2H and δ18O values encountered
by isotope laboratories worldwide (δ18O values from ~0 to −22 ‰,
roughly spanning the range between the VSMOW and GISP reference
materials). The optional samples included a depleted Greenland Fern
melt, a heavy‐isotope‐enriched sample, and a synthetic seawater
(Table 1). The synthetic seawater sample was made by dissolving 30.0 g
of aquarium‐grade Red Sea Salt™ (mainly NaCl) per liter of distilled water
(~0.7 molal NaCl). The seawater mix contained ~350 mg/L calcium,
~1000 mg/L magnesium and ~320 mg/L potassium.16
The fifth core sample was contaminated with trace methanol
(15.6 nmol/L CH3OH, 99.9%purity) to determinewhether LAS analysts
were vigilant in monitoring for and detecting spectral interferences, as
well as to assess the impact of trace VOC 'pollutants' on LAS and IRMS
isotope analyses.10,11 Accordingly, the contamination of the WICO5
sample was not revealed to the participants. Preliminary testing of
WICO5 at the IAEA (Vienna, Austria) using 2140 (Picarro, Santa Clara,
CA, USA), DLT‐100 (Los Gatos Research, San Jose, CA, USA) and
TIWA‐24d (Los Gatos Research) laser analyzers and laser spectral
evaluation software (Picarro ChemCorrect™ and Los Gatos Research
Spectral Contamination Identifier™) indicated that sample WICO5 was
lightly but sufficiently contaminated with methanol to cause highly
TABLE 1 WICO2016 samples and reference values for δ18OVSMOW and δ2HVSMOW. The . δ18O values were determined by CO2‐H2O
equilibrationa‐d using dual‐inlet isotope‐ratio mass spectrometry. The δ2
H values were determined by H2‐H2O equilibrationa‐c using dual‐inletisotope‐ratio mass spectrometry. Samples were normalized to the VSMOW using two‐point calibration with VSMOW/2a‐d and SLAP1/SLAP2a,b,d/USGS49c standard waters. Reference values are the power‐moderated mean ± robust standard deviation. The WICO5 results were determined onan uncontaminated subsample*
Testsample Water source
Conductivity(microS/cm)
δ18OVSMOW
(‰)SD(‰) N
δ2HVSMOW
(‰)SD(‰) N
d‐excess(‰)
Core Set
WICO1 Danube River water, Austria, filtered 560 −10.80 0.02 25 −77.5 0.6 25 8.9±0.6
WICO2 Neusiedler See (lake water), Austria, filtered 1430 −5.11 0.02 25 −41.9 0.8 27 −1.0±0.6
WICO3 Bow River watere, Canada, filtered 235 −22.01 0.04 25 −168.5 0.8 26 7.6±0.8
WICO4 Ground water mix, Egypt, Austria, filtered,distilled
10 −0.50 0.04 25 +0.4 0.8 26 4.4±0.8
WICO5* Vienna tap water and WICO6 mix, researchgrade methanol was added gravimetricallyto produce a 0.05 % methanol/watervolumetric ratio
290 −15.68 0.02 23 −114.5 0.6 25 10.9±0.6
Optional Set
WICO6 Greenland firn meltf, unfiltered 40 −41.41 0.02 25 −323.8 0.8 26 7.5±0.8
WICO7 Vienna tap water mixed with 99.99 % D2O and96.05 % H2
18O to ensure a usual d‐excessand isotopically positive result
385 +5.61 0.04 26 +55.2 0.8 26 10.3±0.8
WICO8 Synthetic seawater (30 g/L Red Sea salt), mixedwith deionized water and WICO 6 toproduce negative values with a usuald‐excess, unfiltered
~53000 −3.45 0.06 28 −17.5 0.8 27 10.1±0.8
N = total number of DI‐IRMS analyses. The derived d‐excess = δ2HVSMOW – 8 × δ18OVSMOW
aIsotope Hydrology Laboratory, IAEA, Vienna, Austria.bGEOTOP, University of Quebec, Montreal, Canada (J.‐F. Hélie).cU.S. Geological Survey, Reston, VA, USA. (T.B. Coplen).dUniversity of Groningen, Groningen, The Netherlands. (H.A.J. Meijer).eCourtesy of B. Mayer (University of Calgary, Canada).fCourtesy of T. Blunier (University of Copenhagen, Denmark).
*Participants received a contaminated WICO5 sample which contained 0.05 % methanol.
WASSENAAR ET AL. 395
divergent δ2H and δ18O results compared with the uncontaminated
sample. Testing at IAEA further showed no O or H isotopic difference
between the contaminated and uncontaminated WICO5 when
analyzed by CO2– and H2–H2O equilibration and DI‐IRMS.
The WICO2016 test waters were stored in 20‐L stainless steel
drums at room temperature under 0.5 bar N2 gas pressure in order
to prevent evaporation. Each drum had a siphon and valve system that
allowed water to be withdrawn without opening the drum. Samples
were dispensed into 30‐mL amber glass bottles with conical insert PP
screw caps (Etivera, St. Margarethen an der Raab, Austria); each WICO
sample dispensing was conducted in a one session. All 30‐mL test
bottles were labelled with the sample name and filling sequence
number. Afterwards, random testing of seven bottles of each WICO
sample by LAS and DI‐IRMS at the IAEA verified the isotopic
homogeneity of the WICO samples. One set of the final WICO
samples was measured by four reference laboratories.
2.3 | Establishing reference δ‐values
The reference δ18OVSMOW and δ2HVSMOW values of the WICO2016
samples were established by the consensus of expert laboratory
approach, according to ISO 13528:2015,17 with the condition that
each reference laboratory conducted analyses by DI‐IRMS using
certified isotopic reference materials and two‐point normalization.18
The expert consensus approach was required because primary
reference materials are limited and inappropriate for inter‐comparison
purposes. The δ18O and δ2H reference values of WICO2016 samples
were established by four expert laboratories (USGS, Reston, VA, USA;
GEOTOP, Montreal, Canada; CIR, Groningen, The Netherlands; and
the IAEA Isotope Hydrology Laboratory, Vienna, Austria). The δ18O
values were determined by CO2–H2O equilibration and DI‐IRMS at
the reference laboratories. The δ2H values were determined by H2–
H2O equilibration and DI‐IRMS at three of the reference laboratories.
All samples were normalized to the VSMOW‐SLAP scale by using
VSMOW/2 and SLAP (IAEA) or SLAP2 (GEOTOP, CIR, USGS) or
USGS4919 (USGS) reference waters measured along with the WICO
samples. Following normality and outlier testing, the reference values
were determined as the power‐moderated mean and robust standard
deviation, which took into account the uncertainties reported by the
reference laboratories.20 For the WICO5 sample, reference value
measurements were conducted on an uncontaminated sub‐sample.
The deuterium‐excess (d‐excess) for each test sample was
derived as d‐excess = δ2HVSMOW – 8 × δ18OVSMOW. The standard
uncertainties (ux) of the reference lab results were determined by:
396 WASSENAAR ET AL.
ux ¼ 1:25×Robust Std:Dev:ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffinumber
pof assays
(1)
In all cases, ux was ≤0.3 × the standard deviation for proficiency
assessment (SDPA); hence, the uncertainty of the assigned δ‐values
was considered negligible for the purpose of this inter‐comparison
testing, and did not require the alternative use of z'(prime)‐scores. All
reference values were rounded to two (δ18O values) or one (δ2H
values) decimal place, and they are summarized in Table 1. The
reference δ‐values of the WICO2016 samples were not revealed to
participants beforehand.
2.4 | Shipping and data collection
The WICO2016 samples were shipped to participating laboratories in
March 2016, and all laboratories agreed to submit results by June 30,
2016. The optional WICO samples were selected by a smaller cohort
of laboratories appropriate to their analytical interests. Participants
were invited to measure WICO samples at least twice using the
method(s), instrumentation, and calibration standards considered
routine for their laboratory and to normalize and report their results
relative to the VSMOW‐SLAP scale. Information returned to the IAEA
included the final δ2HVSMOW and/or δ18OVSMOW values for each test
water, basic statistics (n, uncertainty), and operational metadata on
preparation methods used (CO2/H2–H2O equilibration, reduction
methods, etc.), the instrumentation used for isotopic measurement
(IRMS, LAS, etc.) and laboratory practices (laboratory standards,
injections, data processing, etc.).
After the June deadline, 31 laboratories requested an extension.
Twenty laboratories submitted late results (not included in this
synthesis), and the remaining 11 laboratories did not submit results.
Some laboratories had multiple instruments (lasers and mass
spectrometers); therefore, each instrumental result was considered
an individual submission so that results from different types of
instruments in one laboratory were not being averaged. Not all
laboratories provided the requested metadata. In total, there were up
to 267 submissions (from 235 reporting laboratories) for the H and O
isotope evaluation synthesis. Confidential laboratory assessment
reports were provided to each participant in July 2016.
2.5 | Statistical tests
Determination of unacceptable outliers from the submitted results
was obtained by calculating the upper (HU) and lower (HL)
quartiles for each test water and reported isotope ratios (H or O)
for all the laboratories, as well as the corresponding interquartile
range (HU–HL). Values exceeding 1.5‐times the interquartile range
HU + 1.5·(HU–HL) and HL–1.5·(HU–HL) were denoted as
unacceptable outliers. The median, quartile limits and outliers were
illustrated graphically by using Tukey box‐and‐whisker plots for
each sample and each isotope ratio.
The standard deviation for proficiency assessment was the
maximum acceptable difference between the assigned reference
δ‐value and the laboratory‐submitted δ‐value for each sample. As
deemed appropriate for hydrological applications, the SDPA was set
to 0.2‰ for δ18O values and to 1.5 ‰ for δ2H values (compared with
2 ‰ for δ2H values in WICO2011); both were considered reasonable
criteria that encompassed a diverse range of contemporary
preparative technologies, analytical instrument types, and instrumental
specifications. We acknowledge that a more restrictive SDPA
(e.g. 0.05 for δ18O values) would be required for laboratories engaged,
for example, in high‐resolution paleoclimate reconstructions from ice
core isotopic data,21,22 but such stringent testing was beyond the
scope of WICO2016 and not as broadly applicable.
2.6 | z‐scores
An unacceptable z‐score for a laboratory‐reported result was a δ18O
or δ2H value that deviated from the reference δ‐value by more than
4 times the SDPA. The z‐score for each submitted sample isotopic
(H or O) result was determined by:
z ¼ P−ASDPA
(2)
where P was the participant‐reported δ‐value for each WICO
sample, A was the assigned δ‐value, and the SDPA was as defined
above. The following interpretations were given to z‐score results:
∣z∣≤2:00 Acceptable=Fit for Purpose
2:00<∣z∣<3:00 Questionable
∣z∣≥3:00 Unacceptable
A sample z‐score of 0 implies a perfect match (no bias) from the
reference δ‐value. Individual z‐scores within each of the categories
should not be over‐interpreted; for example, a z‐score of 0.2 is not
statistically 'better' than 1.6; both are acceptable and fit‐for‐purpose
as defined.
2.7 | En‐scores
A subset of the laboratories yielding acceptable z‐scores test for core
samples WICO1–4 was further scrutinized by taking into account the
reported uncertainty for each submitted WICO sample. The En‐score
indicated whether a reported result was fit‐for‐purpose by taking
into account the uncertainty reported by the laboratory and the
uncertainty of the assigned reference value (Table 1). No attempt
was made to assess how uncertainty was established by each
laboratory due to the variety of ways in which laboratories
determined it (Table 2); hence, the reported laboratory uncertainty
was accepted at face value. An En‐score was calculated for each
sample and stable isotope ratio using:
En ¼ x−Lð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiU2x þ U2
L
q (3)
where x was the laboratory‐reported δ‐value for each sample, L was
the reference δ‐value, Ux was the uncertainty reported by the
laboratory, and UL was the uncertainty of the reference value. An
En value of ≤ |1| was considered acceptable, whereas an En‐value of
> |1| was unacceptable. En‐scores closer to 0 represent more accurate
and precise results.
TABLE 2 Survey of participant laboratory practices: calibration standard sources, storage, calibration frequency, data normalization methods, anduncertainty methods. Results of the post‐WICO survey summarizing self‐identified reasons for poor performance
Source of laboratory calibration standards (n=266) Data normalization method (n=239)
Made locally 65 % 3‐point (or more) standard calibration 51 %
IAEA (VSMOW2/SLAP2) 6 % 2‐point standard calibration 44 %
Instrument Manufacturer 9 % 1‐point standard calibration 4 %
U.S. Geological Survey 6 %
A colleague 6 % Measurement uncertainty reporting method
NIST 4 % 1 Sigma SD of Control Standard 76 %
Scientific Supplier 4 % 1 Sigma SD of Repeated Samples 6 %
Assign Blanket Uncertainty Value 5 %
Storage of laboratory calibration standards (n=235) Full Error Propagation 5 %
Small Glass Bottles (< 1 L) 46 % 2 Sigma SD of Control Std 2 %
Flame Sealed Glass Ampules 15 % Uncertainty not determined 1 %
Large Glass Bottles (1‐10 L) 14 % RMSE of Control Std over time 1 %
Large Steel or Aluminium (>10 L) 15 % SEM of Control Std over time < 1 %
Small Plastic Bottles 2 % Worst 1 Sigma SD of repeats < 1 %
Large plastic (>10 L) 3 %
Other… 5 % Post WICO Survey of Problems (n=66)(multiple issues per lab sum >100 %)
(Re)Calibration Frequency using VSMOW2/SLAP2 (n=196) Incorrect data normalization 33%
1 Year 44 % Insufficient lab std. δ‐scale coverage 32 %
2 Years 20 % Instrumental Problems 31 %
3 Years 25 % Problem not yet identified 23 %
Never 7 % Compromised laboratory Standards 15 %
Don't know 4 %
TABLE 3 Scoring schema to assess overall laboratory accuracy for
core samples WICO1–4. Scoring criteria were based on instrumenttechnology specifications and expert expectations. Out of 12 pointsper isotope ratio, the scoring brackets of 12–10, 9–7, 6–4 and ≤3 weredeemed as excellent, acceptable, questionable and unacceptable,respectively. Deviation in ‰ is the absolute δ‐difference between thereference and measured δ‐values. Score per isotope ratio is the sum ofearned points
PointsLaserspectrometry
Continuous‐flowIRMS
Dual‐inletIRMS
δ2HVSMOW Deviation in ‰ Deviation in ‰ Deviation in ‰
3 ≤0.5 ≤0.5 ≤1.0
2 ≤1.0 ≤1.0 ≤1.5
1 ≤2.0 ≤2.0 ≤3.0
0 >2.0 >2.0 >3.0
δ18OVSMOW
3 ≤0.1 ≤0.1 ≤0.05
WASSENAAR ET AL. 397
2.8 | Cumulative rankings
We ranked the collective accuracy for both O and H isotope ratios
using point scores for the core WICO1–4 set (Table 3), as well as
cumulative offsets. The collective scores yielded an overall picture of
laboratory accuracy performance for a diverse range of sample isotopic
compositions, rather than that obtained by focusing on individual
WICO sample z‐score results.13 This point scoring furthermore took
into consideration instrumental specifications and expectations
(Table 3). More accurate collective values resulted in higher scores.
Out of 12 points per isotope ratio, the scoring brackets of 12‐10,
9‐7, 6‐4 and ≤3 were deemed as excellent, acceptable, questionable
and unacceptable, respectively. No attempt was made to incorporate
measurement uncertainty into the collective point scores. We ranked
laboratories by their cumulative offset from the reference δ‐values
for the WICO1–4 core set of samples by O and H isotope ratios and
by measurement technology.
2 ≤0.2 ≤0.2 ≤0.11 ≤0.3 ≤0.3 ≤0.2
0 >0.3 >0.3 >0.2
Scores: 12–10 = Excellent; 9–7 = Acceptable; 6–4 = Questionable; 0–3 =Unacceptable
3 | RESULTS AND DISCUSSION
The participation rate in WICO2016 was exceptional: for the core set
(WICO1–5), 267 results were submitted for δ18O values of which 157
(59 %) were LAS analyses, 83 (31 %) were by CF‐IRMS and 27 (10 %)
were by DI‐IRMS. For δ2H values, there were 259 returns of which
158 (61 %) were LAS analyses, 72 (28 %) were by CF‐IRMS and 29
(11 %) were by DI‐IRMS. For the optional samples, there were 123
and 118 returns for WICO6, 147 and 141 for WICO7 and 96 and 91
returns for WICO8 for δ18O and δ2H values, respectively. These
returns represented a ~75 % increase in participation over
398 WASSENAAR ET AL.
WICO2011,14 and a ~210 % increase over WICO 2002. A large part of
the increase was due to the proliferation of LAS analyzers since around
2009; LAS comprised 0 % of participants in WICO2002, 45 % in
WICO2011, and 59 % in the current test. The number of IRMS
participants increased by 100 % over WICO2011 from 58 to 116
instruments. All (anonymized) δ‐value results for WICO2016 samples
were tabulated and are reported in Table S1 (supporting information).
By technology, most LAS instrument laboratories employed a
liquid autosampler and a water sample vaporizer; however, two
laboratories conducted manual injections and two employed custom‐
made vapor streams. For DI‐IRMS for δ18O values, all laboratories used
an isothermal CO2–H2O equilibration apparatus (1–12 h, 15–40°C)
with or without agitation. For DI‐IRMS for δ2H values, 68 % used
isothermal H2–H2O equilibration (1–12 h, 12–40°C) with or without
agitation, 4 % used Cr reduction and one used Zn reduction. For
CF‐IRMS for δ18O values, 70 % used static isothermal CO2–H2O
equilibration (3–36 h, 20–50°C) and headspace gas sampling, and 30 %
used directwater injectionswith high‐temperature carbon (HTC) reduction
to CO gas. For CF‐IRMS for δ2H values, 43 % used online HTC reduction
to H2 gas, 42 % used static H2–H2O equilibration and headspace sampling
(1–72 h, 20–50°C), and 15 % used online Cr reduction to H2 gas.
An overview of the operational practices of participating
laboratories is tabulated inTable 2. For daily‐use laboratory calibration
standards, most laboratories made their own in‐house calibration
standards (65 %) and stored them in glass bottles or sealed ampules
(90 %). The overall range of δ‐values for laboratory standards used in
the majority of international laboratories is plotted in Figure 1. There
were distinctive modalities in the laboratory calibration standards that
corresponded to typical 'bracketing ranges’, particularly near the
VSMOW and SLAP endmember δ‐values. The majority of laboratory
daily‐use calibration standards had δ18O values between +5 and −30 ‰.
Seven laboratories (results not plotted) employed daily‐use standards
with exceptionally high δ2H values, ranging from +100 to +1200 ‰,
and two laboratories reported using δ18O calibration standards with
high values of +51 and +117 ‰. A small proportion of laboratories
reported running WICO test samples against primary standard
reference materials VSMOW2 and SLAP2 (6 %), a practice generally
FIGURE 1 Range and modalities of laboratory calibration standards forlaboratories for their routine operations [Color figure can be viewed at wil
discouraged in routine operations. Most laboratories followed a
reasonable VSMOW2/SLAP2 laboratory standard (re)calibration cycle
between 1 and 3 years. Some laboratories indicated no laboratory
standard calibration efforts; these purchased or obtained pre‐calibrated
standards from the USGS, NIST or other outside sources.
More laboratories used 3‐or‐more point data normalization (51 %)
than 2‐point data normalization (44 %), and only 4 % of laboratories
used (not recommended) single‐point data normalization.23
Measurement uncertainty was generally reported as the 1‐sigma
standard deviation of a control standard over an unspecified period
of time or number of analyses (76 %); however, the survey revealed
a wide diversity of methods for expressing measurement uncertainty.
3.1 | Unacceptable outliers
Outlier testing revealed that a sizeable number of laboratories
produced discordant results for one or more samples, as depicted by
Tukey box‐and‐whisker plots and a δ2H vs δ18O cross‐plot for each
of the WICO test waters (Figures 2 and 3; WICO5 excluded, see
below). Overall, the median values of the reported WICO results
agreed well with their assigned values (Figure 2, Table 1); however,
the isotopic data were non‐normally distributed (Q‐Q plots, not
shown) due to the effect of the outlier laboratories. With outliers
removed, the population median values closely matched the WICO
reference values, as expected. Samples falling beyond 1.5‐times the
interquartile range (red circles in Figure 2) were deemed unacceptable
outliers. For δ2H values, of 1036 results submitted for the core set,
101 were deemed unacceptable outliers (10 %), with similar
percentages for each of the core sample. This was slightly higher than
the δ2H outlier rate in WICO2011 (6 %). For WICO δ2H samples, the
proportion of outliers was similar at 8 % (depleted), 13 % (enriched)
and 7 % (saline) for WICO 6, 7 and 8. For the δ18O samples, of 1068
submitted results for the core set, 136 were outliers (12.7 %) with similar
percentages for all core samples, and slightly higher than in WICO2011
(10.6 %). For the optional WICO δ18O samples, the proportion of
outliers was considerably higher at 20 % (depleted sample), 17 %
(enriched sample) and 11 % (saline sample) for WICO 6, 7 and 8.
oxygen and hydrogen isotope ratios reported by 235 internationaleyonlinelibrary.com]
FIGURE 2 Box‐and‐whisker plots of the WICO core and optional test waters for δ2H (top) and δ18O (bottom) values. Median and interquartileranges are depicted, and the number of analyses per sample. Circles represent unacceptable outliers. Note: only the overall range and medianare depicted for contaminated WICO5 [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 3 Cross‐plot of submitted δ2H versus δ18O for WICO2016 core, optional, and contaminated samples (left). Cross‐plot of d‐excess versusδ18O for WICO core, optional, and contaminated sample (right). Black dots denote the reference values [Color figure can be viewed atwileyonlinelibrary.com]
WASSENAAR ET AL. 399
Considering only the unacceptable outliers by instrument tech-
nology, 60 % of the δ18O core set outliers were LAS analyses, 40 %
were CF‐IRMS analyses, and <2 % were DI‐IRMS analyses. For the
optional δ18O samples, the unacceptable outlier proportions were
similar, with the exception of saline sample WICO 8 where 64 % of
outliers were LAS analyses. For δ2H samples, 47 % of outliers were
LAS analyses in the core set, 41 % were DI‐IRMS analyses, and
12 % were CF‐IRMS analyses. For the optional δ2H samples, the
unacceptable outlier proportions were similar, with the exception of
the saline test sample where the higher proportions (69 %) of outliers
were from LAS analyses.
One general observation (not depicted) was that roughly half of
laboratories that produced unacceptable outliers did so for the entire
set of WICO samples, regardless of which instrument technology was
used. This finding suggested that approximately 5–6 % of participating
laboratories gave unacceptably poor general performance for a variety
400 WASSENAAR ET AL.
of possible reasons. Nevertheless, the largest proportion of outliers was
from LAS instrument laboratories, and considering their rapid growth
they probably represented a higher percentage of less‐experienced
laboratories. There were some striking contrasts in the core set of IRMS
outliers: DI‐IRMS laboratories held an unexpectedly large proportion of
unacceptable outliers for δ2H values (44 %) compared with δ18O values
(<2 %); conversely CF‐IRMS laboratories held a large proportion of
unacceptable outliers for δ18O values (40 %) compared with δ2H values
(12 %). Unlike WICO2011, there was no unacceptable outlier bias
towards positive or negative values for either H or O isotope ratios.
All unacceptable outlier results (e.g. z‐scores > |3|) were removed for
subsequent z‐score plots and En performance evaluations and
depictions. The contaminated core WICO 5 sample results are
discussed separately, below.
3.2 | Assessment of z‐scores
A graphical depiction of δ2H and δ18O z‐scores for the non‐outlier
WICO2016 samples plotted by measurement technology in Youden
dual‐isotope cross‐plots is shown in Figure 4.24,25 Laboratory results
plotting closer to the origin on both z‐score axes exhibited more
accurate performance for both isotope ratios. Samples falling along a
slope of 0.5 (45 degrees) into the upper right or lower left quadrants
exhibited systematic errors for both isotope ratios (e.g. from both
ratios measured on the same injection). A lack of H and O z‐score
correlation or results that plotted in the upper left and lower right
quadrants suggested that H and O performance outcomes are
analytically disconnected, either by separate assays or higher analytical
biases on one or other of the isotope ratios.
The vast majority of laboratories using DI‐IRMS instruments and
CO2– and H2–H2O equilibration produced fit‐for‐purpose z‐score
results for δ2H and δ18O values, with very few questionable results
FIGURE 4 Youden z‐score cross‐plots of δ2H and δ18O values for allFigures S1–S3 (supporting information) for cross‐plots by instrument tecviewed at wileyonlinelibrary.com]
(Figure 4; expanded in Figure S1, supporting information). These plots
revealed that δ18O analyses performed by DI‐IRMS laboratories using
CO2‐H2O equilibration were exceptionally good performers compared
with those using other technologies (tightly along the y‐axis);
however, there was far more variance in the DI‐IRMS δ2H z‐scores
(x‐axis), but still within the acceptable range (Figure S1, supporting
information). Because dual‐inlet O and H analyses are usually
conducted separately, no correlation was found between the DI‐IRMS
δ18O and δ2H z‐scores (r2 <0.01). The saline WICO8 sample exhibited
slightly negative δ2H biases compared with the reference value, but
the results were mostly acceptable.
For CF‐IRMS instrument laboratories, the Youden plots exhibited
mostly acceptable z‐scores for both isotope ratios, but with
comparatively larger scatter for both δ18O and δ2H values than with
DI‐IRMS (Figure 4; expanded in Figure S2, supporting information).
The saline WICO8 sample had a tighter range of δ18O z‐scores, but
again a slightly negative spread for δ2H values, akin to that observed
in DI‐IRMS. Similarly, no trend was found for comparative δ18O and
δ2H z‐score results (r2 <0.05), probably because most CF‐IRMS O
and H isotope measurements are also conducted separately.
For LAS water isotope analyses, the Youden z‐score plots
exhibited predominantly fit‐for‐purpose results with a spread of
z‐scores similar to CF‐IRMS (Figure 4; expanded in Figure S3,
supporting information). LAS laboratories had comparatively more
questionable or unacceptable z‐score results for δ2H and δ18O values
than the DI‐ and CF‐IRMS laboratories. Because LAS instruments
measure H and O isotopologues directly on the same water injections,
there were correlations (r2 = 0.11–0.38) between the O and H isotope
z‐scores as depicted by dashed regression lines in Figure 4 (Figure S3,
supporting information), an observation also noted in WICO2011.
Thus, unlike DI‐ or CF‐IRMS instrumentation, biased results from
LAS are more likely to affect both H and O isotopes proportionately.
laboratories (outliers removed) for all the WICO test waters. Seehnology type and expanded WICO5 axes [Color figure can be
WASSENAAR ET AL. 401
As with the IRMS, the saline WICO8 LAS results exhibited a
slightly negative bias for δ2H values, but mostly within fit‐for‐purpose
limits. It is worth noting the WICO8 reference values for δ18O and δ2H
were established using CO2– and H2–H2O equilibration 'activity'
methods, which produce differing δ‐values from water with increasing
salt amount (and type), especially above seawater salinities (>35 g/L).26
Conversely, LAS analyses are direct water isotopologue 'concentration'
measurements, and in WICO2016 these results were compared with
activity‐based reference δ‐values. Recently, studies have shown
that the comparative δ‐value differences between activity and
concentration methods for seawater samples are < +0.1 ‰ for δ18O
values and ~ +1.5 ‰ for δ2H values at 35 g/L, and that they may
require a correction factor.27,28 The WICO8 (30 g/L) results confirmed
these experimental findings; the WICO8 population median
differences between DI‐IRMS activity‐based and laser measurements
were +0.05 ‰ (within SDPA) for δ18O values, but were +1.0 ‰ for
δ2H values. These results affirm that measurement of saline water
samples, particularly for water samples beyond 30 g/L, measured by
activity versus concentration methods will require an agreed
calibration to a common concentration or activity scale.27,29 This is
an area for more research, and possibly for future WICO testing.
3.3 | En‐Scores
For laboratories that reported uncertainties for the core WICO1–4
samples, the En‐scores (without outliers) were summarized in
histograms for δ2H and δ18O values (Figure 5), and categorized by both
isotope and instrument technology. The En‐scores for laboratories
were normally distributed around 0 to |1| (acceptable), with a few
FIGURE 5 En‐scores for δ18O (top) and δ2H (bottom) values classified byacceptable performance
outliers. In all cases, laboratories with En‐scores > |1|, regardless of
whether LAS or IRMS, fell into two categories: those laboratories with
accurate δ‐results but unrealistically reported low uncertainties (e.g.
±0.01 ‰ uncertainty for LAS δ18O values), and those with biased
results and uncertainties sufficient to plot outside acceptable En limits.
Median (max/min) reported measurement uncertainties for δ18O
values for the WICO1–4 core samples by all laboratories were ±0.09
(±0.8/±0.01) ‰, ±0.04 (±0.6/±0.01) ‰, and ±0.09 (±3.5/±0.01) ‰
for CF‐IRMS, DI‐IRMS and LAS, respectively. For δ2H values, the
median (max/min) measurement uncertainties were ±0.9 (±4.8/
±0.04) ‰, ±0.5 (±2.1/±0.1) ‰, and ±0.5 (±4.5/±0.1) ‰. Whereas
the median reported uncertainties appeared to be reasonable for
top performing laboratories, some laboratories reported uncertainties
that far exceeded the capabilities of the instrument technology, or
were unrealistically low when the uncertainty of the primary and
laboratory standards was factored in through basic error propagation
(e.g. one cannot report a lower uncertainty than that inherent in the
calibration standards used). For example, VSMOW2 has an uncertainty
of ±0.3 ‰ for δ2H values and the median LAS uncertainty for
WICO2016 was ±0.5 (Table 2 – as 1 sigma SD); hence, the
propagated uncertainty for VSMOW2 by LAS would be, on average,
±0.6 ‰ (e.g.
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi0:32 þ 0:52
� �r), which is 0.1 ‰ higher than the
median population uncertainty reported for LAS. Conversely, and in
particular for LAS analyses, some laboratories reported unrealistically
poor (high) uncertainties for each isotope ratio (e.g. ±4.5 ‰ for LAS
δ2H values) which suggested either that unacceptably poorly
performing instrumentation was used, or that insufficient care and
attention were taken to properly establish analytical uncertainty. In
measurement technology. The grey area encompasses the range of
402 WASSENAAR ET AL.
short, as noted in all the previous WICO tests, the reporting of
analytical uncertainty for water stable isotope measurements
remains a perpetually inconsistent practice, and at the very least
the uncertainty should be determined by basic error propagation
methods using primary and secondary calibration standards and
control replicates along with clearly stated methodologies.
3.4 | Collective accuracy scoring and cumulativeoffset rankings
Collective accuracy scores for WICO1–4 core samples spanning the
most commonly measured isotopic range are plotted in Figure 6. The
overall top performers for δ18O values were DI‐IRMS laboratories:
93% of laboratories produced excellent to acceptable collective results,
followed by CF‐IRMS (74 %) and LAS (70 %). The outcome for δ2H
values was not as encouraging; top performers were the DI‐IRMS and
LAS laboratories (62 % and 61 % excellent to acceptable collective
scores, respectively), but fewer than half (49 %) of the CF‐IRMS
laboratories gave acceptable to excellent δ2H outcomes.
Collective accuracy performance also varied by isotope ratio and
by instrumentation type. Surprisingly, the highest proportion of poor
performance for δ2H values was measured by CF‐IRMS (51 %). In
general, with the exception of DI‐IRMS for δ18O values, and
considering all WICO core samples, between 26 and 38 % of
laboratories were unable to produce acceptable to excellent
collective results for the core suite of WICO test samples.
Considering the ranking of WICO1–4 outcomes by their
cumulative offsets from the reference values (expanded in Figures S5
and S6, supporting information) for δ18O and δ2H values, a similar
picture emerged. Approximately 73 % of the δ18O WICO submissions
gave cumulative offsets lower than 0.8 ‰ (4 × 0.2 SDPA), whereas
27 % had offsets higher than 0.3 ‰. The average population
FIGURE 6 Combined cumulative performance scores for the WICO1‐4 cviewed at wileyonlinelibrary.com]
cumulative offsets by technology performance were 0.3 ‰, 0.8 ‰
and 1.2 ‰ for DI‐IRMS, LAS and CF‐IRMS, respectively, again indicat-
ing that DI‐IRMS laboratories were top performers for δ18O values.
Similarly, for δ2H values, ~73 % had cumulative offsets lower than 6
‰ (4 × 1.5 SDPA), and ~27 % were higher. However, the top per-
formers for δ2H values, having the lowest cumulative offsets, were
dominated by LAS instruments, followed by DI‐ and then CF‐IRMS
instruments, with average offsets from the reference values of +0.5
‰, +0.6 ‰ and +0.9 ‰, respectively. In fact, 8 of the 10 best δ2H
ranked results were LAS laboratories (Figure S6, supporting
information).
These collective scores and cumulative offset rankings suggest
that DI‐IRMS δ18O analysis by CO2–H2O equilibration remains the
gold standard for top oxygen isotope ratio analytical performance,
and to a lesser extent H2–H2O equilibration for δ2H analyses probably
from a combination of the higher‐level technical expertise required to
operate DI‐IRMS instruments and a decades‐long methodological
consistency in gas‐water isotope equilibration protocols.30 On the
other hand, the top δ2H results were produced mostly by LAS
laboratories (Figure S6, supporting information). The top‐tier LAS
performers for δ2H results were evenly split between CRDS and
OA‐ICOS instruments. Overall, the worst performance for both O
and H isotope ratios was from CF‐IRMS laboratories. This suggests
that one strategy for obtaining best isotopic outcomes would be to
measure δ18O values by DI‐IRMS and δ2H values by LAS.
By contrast, for seemingly straightforward LAS instrumentation
there was remarkable (and unnecessary) inconsistency in the wide
range of basic operational and data processing approaches used by
these laboratories, as illustrated in Figure 7. It was unclear whether this
diversity of approaches was due to considerable new‐user inexperience,
or from attempts to improve LAS results by experimentation with
operational adjustments. The number of injections and ignores per
ore samples using the scoring criteria in Table 3 [Color figure can be
FIGURE 7 Summary of reported operational practices for participating LAS spectrometer laboratories
WASSENAAR ET AL. 403
vial for LAS varied widely by laboratory, although most followed manu-
facturer suggestions of 6 or 8 injections and ignored the first 3–4 injec-
tions to help reduce memory effects. More problematic, however, was
that >55 % of LAS laboratories took no corrective actions to correct for
between‐sample memory or drift, and only 18 % of laboratories
checked samples for spectral contamination (see below), or for H2O
amount dependencies due to underperforming syringes. It seems likely
that LAS performance could be easily improved by laboratories using
the recommended manufacture operational guidelines and also by
adopting well‐established data corrections for LAS memory
corrections, drift and systematic template approaches.13,31
3.5 | Contaminated water sample WICO5
The results for WICO5 are plotted in Tukey box‐and‐whisker plots
and in dual‐isotope ratio space in Figures 2 and 3, and as Youden
z‐scores plots in Figure 4. These data revealed some remarkable
outcomes. First, the range of reported δ2H and δ18O values for WICO5
spanned an astonishing 146 ‰ and 55 ‰, respectively (Figures 3
and 4), with highly abnormal d‐excess values ranging from −80 to
+70 ‰. Examination of the z‐scores plots (Figure 4) showed that
outlier δ‐values forWICO5were almost all attributable to LAS analyses,
clearly a result of well‐known spectral interferences due to the trace
content of methanol. The z‐scores furthermore clustered by LAS
manufacturer and by instrument model. For example, Los Gatos and
the Picarro 2100 series instruments resulted in clustered positively
biased isotopic results. Conversely, older Picarro 1100‐2110 series
instruments resulted in clustered negatively biased results. A few
lasers (inexplicably) produced acceptable δ‐values for WICO5 without
any laboratory‐reported contamination. Only 18 % of laboratories
said they screened for spectral interferences (Figure 7); however, only
18 LAS laboratories (12 %) explicitly reported back that WICO5 was
contaminated, despite the extraordinary d‐excess values. Of all the
laboratories that reported organic contamination, two took additional
steps to correct their results. In one case, a Picarro laboratory used a
micro‐combustion module interface to combust the water and
organic contaminant; however, this approach produced a biased result
for WICO5. In the second case, an experienced Los Gatos user recog-
nized the spectral pattern as an alcohol, and constructed an alcohol
concentration correction algorithm which produced an accurate final
result for WICO5. These findings for WICO5 clearly demonstrate that
known (and unknown) trace VOC contamination has a very serious
impact on LAS‐based water H and O isotope analyses.11,12,32 For most
natural waters submitted to laboratories it will be impossible to predict
(or screen) whether interfering trace VOCs are present, or not. Two
strategies are recommended for LAS users to better screen for VOC‐
compromised samples. First, every autorun should be scrutinized using
the manufacturer‐provided spectral contamination software in routine
operation – this effort takes only a fewminutes. This will be particularly
important for waters that are suspected to contain VOCs (e.g. hydro-
carbon‐gas rich waters, landfill leachates, soil and plant waters). Sam-
ples identified as spectrally compromised should be repeated to verify
contamination (false positives are possible), or not reported.32 Second,
laboratories should derive the d‐excess of the measured water samples
as a potential proxy contamination indicator in case of failure of the
spectral software to identify compromised samples. Samples having
d‐excess values falling outside the expected (location specific) or the
common range of approximately −10 to +25 ‰ (e.g. Figure 3) for
non‐evaporated environmental waters may be flagged as 'suspect'
and should be repeated or verified using IRMS methods, if possible.
For CF‐IRMS, the outcomes for WICO5 were surprisingly poor, as
evident by a significant number of unacceptable H and O results
(Figure 3 and 4; Figure S2, supporting information). Whereas for
H2–H2O equilibration (n = 53) nearly all the δ 2H results were acceptable
(mean z‐score of −0.4), both the HTC reduction (n = 28) and the Cr
reduction (n = 11) methods yielded many biased δ2H results, having
mean z‐scores of −1.38 ± 5.1 and 2.62 ± 9.1, respectively. For CF‐IRMS
δ18O values by CO2–H2O equilibration (n = 53), most results were
mainly acceptable (mean z‐score of −0.2), but online HTC reduction
(n = 22) caused most of the unacceptable results and was negatively
biased (mean z‐score of −1.85 ± 3.8). For DI‐IRMS (Figure 4;
Figure S1, supporting information), there was no evidence that
methanol contamination affected either H or O isotope results, either
in the participants reported results or in separate testing at the IAEA.
404 WASSENAAR ET AL.
In summary, trace VOC (e.g. methanol) contamination of water
seriously affected LAS measurements due to spectral interferences,
but also adversely affected the CF‐IRMS decompositional methods
(Cr, HTC reduction). In the latter case, the methanol is thermally
decomposed to H2 and CO with the water sample and, because
solvents generally have negative δ2H values, appeared to cause
negative biases. The CO2– or H2–H2O equilibration methods were
largely unaffected by the trace methanol; in the case of CF‐IRMS this
is probably due to GC separation of the sample H2 or CO2 analyte
gas from methanol, and in the case of DI‐IRMS to −80°C to −100°C
cold trapping of water and methanol vapor prior to expansion of the
sample gas into the sample bellows.
3.6 | d‐Excess results
One surprising result of the WICO2016 test was that the derived
d‐excess values of laboratory‐reported O and H results spanned nearly
the entire natural range for each of the test samples (Figure 2).
Expanded by instrument type (Figure S4, supporting information), the
best performing instruments for d‐excess for the WICO1–4 core
samples were DI‐IRMS, having the lowest range of reported d‐excess
values (SD ±1.6 ‰), followed by LAS (SD ±2.4 ‰) and CF‐IRMS (SD
±3.2 ‰). We hypothesized that LAS might yield improved d‐excess
determinations since measurements of O and H are made on the same
sample, but this was not the case; the spread of reported d‐excess
values was almost as large as that for CF‐IRMS instruments. However,
population median d‐excess offsets from the reference d‐excess values
were lowest for LAS, then CF‐IRMS and DI‐IRMS (−0.1, −0.2 and −0.3
‰). These data also suggest that the widespread use and interpretation
of d‐excess in hydrologic studies should be treatedwith caution unless it
can be demonstrated that laboratory performance for O isotope ratios
and H isotope ratios in particular is exceptionally accurate and precise.
3.7 | Laboratory performance over time
We recalculated the z‐scores of laboratories that took part in
WICO2011 (ensuring that the same instruments were used) to ensure
consistency with WICO2016 methodology. The results showed that
for repeat IRMS laboratories (n = 34) 29.4 % performed about the same
(z‐scoreswithin ±1 ofWICO2016), 29.4% improved (better z‐score, <1)
and 41.2% performedworse (z‐score >±2). For LAS laboratories (n = 30)
the results were similar: 13.3 % performed similarly, 33.3 % improved
and 53.3 % performed worse. In short, about half of the laboratories
that participated in WICO2011 (n = 30) did worse in WICO2016 than
in the previous test, but it should be noted that recurring laboratories
represent only ~11 % of the total WICO2016 participant pool.
3.8 | Factors causing underperformance
We further examined the metadata that many of the participating
laboratories provided (e.g. instrumental and operational practices,
standard δ‐ranges, etc.) and compared these data with their
performance outcomes to assess whether there were key factors that
contributed to excellent or poor performance outcomes. In other
words, did top performing laboratories exhibit practices that were
quantifiably different (better) than those of laboratories that
performed poorly? However, we were unable to distinguish any
operational factors from these metadata that could be clearly related
to good or poor performance. As one example, we compared the
collective scores and metadata of the best performing LAS instrument
laboratories (e.g. excellent point scores) with those of the worst
performing LAS laboratories (e.g. unacceptable point scores). However,
the reported metadata of both populations was indistinguishable; all
used a similar number of sample injections and ignores, and claimed
similar data corrections. As another example, we examined laboratories
(all technologies) that reported using VSMOW2 and SLAP2 and two‐
point normalization to see whether δ‐scaling issues were a factor,
hypothesizing that laboratories using primary reference materials
would give more accurate outcomes for all WICO samples. However,
laboratories that used VSMOW2 and SLAP2 as calibration standards
gave a proportionately similar number of excellent and unacceptable
outcomes to the overall population. Similarly, when we used the
reported results of WICO3 and WICO4 and rescaled the reported
results to obtain WICO2 and WICO1 results, a similar outcome was
found. Clearly, reference δ‐value scaling factors did not appear to be a
significant performance contributor that could easily be distinguished.
The fact we were unable to definitively identify any specific good or
bad performance indicators based on reported results and operational
metadata led us to conclude that poor laboratory performance probably
resulted from unquantifiable 'known unknowns'. These are mistakes
and human errors that are common but would be impossible to quantify
or identify without conducting individual onsite assessments,33 and
that may not be apparent to the laboratory. These include knowledge‐
based or skill‐based factors such as operator experience, basic data
processing mistakes, measurement protocol violations, compromised
or evaporated laboratory standards, or poorly functioning analytical
instruments and peripherals. Recent studies have suggested that
human errors are not insignificant in contributing to underperforming
geochemical analysis outcomes.33 The premise that human, technical,
and instrumental errors are the main drivers for poor water isotope
performance was supported by a post‐WICO2016 survey of the
participating laboratories (Table 2). Twomonths after reporting test results
to laboratories, we asked participants which factors they had identified
that resulted in either overall or specific sample underperformance. Of
the 98 responses received, most laboratories indicated that there were
multiple knowledge‐ or skill‐based factors that led to poor outcomes.
The top 3 factors identified (equally ~30 % each) were mistakes in data
normalization and processing, use of laboratory standards that did not
span the WICO sample δ‐range (although scaling issues did not seem
significant in our general reassessment), andpoor instrumental performance
problems. Other lesser factors identified included compromised or
evaporated laboratory standards (15 %). Surprisingly, 23 % of
underperforming laboratories were not able to identify clear reasons
for poor outcomes several months after participating in theWICO test.
4 | SUMMARY
The WICO2016 inter‐comparison test of 235 laboratories conducting
δ2H and δ18O measurements revealed that ~73 % of laboratories
reported acceptable core sample results, within 0.2 ‰ and 1.5 ‰
WASSENAAR ET AL. 405
of reference δ18O and δ2H values, whereas ~27 % produced
unacceptable results. The top performance for δ18O values was
dominated by dual‐inlet IRMS laboratories; whereas the top δ2H
performance was dominated by LAS laboratories. CF‐IRMS
instruments yielded intermediate results to the LAS and DI‐IRMS
instrument laboratories. The methanol‐contaminated WICO sample
resulted in extremely biased δ‐values for LAS instruments, but VOC
contamination also affected HTC and Cr CF‐IRMS systems. DI‐IRMS
instrument results were unaffected by the methanol contamination.
Overall, from our analysis of submitted laboratory metadata and
test sample performance outcomes, it appeared that poor performance
in WICO mainly resulted from skill‐ and knowledge‐based errors;
these included calculation mistakes, inappropriate or compromised
laboratory calibration standards, use of poorly performing
instrumentation, and lack of vigilance to sample contamination and
unreasonable isotopic outcomes. To counteract these types of errors,
we recommend that stable isotope laboratories include one or two
control standards of known δ‐values (not used in data normalization)
in each autorun; LAS laboratories should screen all their autoruns for
potential spectral contamination; and all laboratories should evaluate
whether the derived d‐excess values (if measuring both isotope ratios)
are realistic. Combined, these simple strategies could help to more
quickly inform the analyst about mistakes, and alert the analyst to
compromised water samples.
ACKNOWLEDGEMENTS
WICO2016 would not have been possible without the enthusiastic
cooperation of scientists and staff of the international stable isotope
laboratories that participated. Thomas Blunier (Denmark) kindly
supplied a stock of Greenland firn meltwater for WICO6. Bernhard
Mayer (Canada) supplied Bow River water for WICO3. We thank
J.‐F. Hélie and H.A.J. Meijer whose facilities served as reference
laboratories for the WICO test samples. We thank M. Urresti for
assistance with WICO sample preparation and T. Chavez for
assistance with organizing international sample shipments. We thank
three anonymous reviewers for constructive comments. This work
was funded by the International Atomic Energy Agency. IAEA staff
(LIW, STW, CD, LAA, PKA) conceived and implemented WICO2016,
conducted sample preparation, analysis, surveys, sample shipping,
data collection and statistical analysis, processed laboratory reports,
and wrote this manuscript. Laboratory identities are kept confidential
(LIW). TBC provided USGS reference values, provided VSMOW
to two reference laboratories, and corroborated our use of methanol
for WICO5.
ORCID
L.I. Wassenaar http://orcid.org/0000-0001-5532-0771
REFERENCES
1. Mook WG, Gat JR, Meijer HAJ, Rozanski K, Froehlich K. EnvironmentalIsotopes in the Hydrological Cycle: Principles and Applications. IAEA/UNESCO; 2001.
2. Kendall C, McDonnell JJ. IsotopeTracers in Catchment Hydrology. Elsevier;2012.
3. Clark ID. Groundwater Geochemistry and Isotopes. Boca Raton: CRCPress; 2015.
4. Schauer AJ, Schoenemann SW, Steig EJ. Routine high‐precision analy-sis of triple water‐isotope ratios using cavity ring‐down spectroscopy.Rapid Commun Mass Spectrom. 2016;30(18):2059‐2069.
5. de Groot PA. Handbook of Stable Isotope Analytical Techniques. 1Elsevier; 2004.
6. de Groot PA. Handbook of Stable Isotope Analytical Techniques. 2Elsevier; 2009.
7. Kerstel ET, VanTrigt R, Reuss J, Meijer HAJ. Simultaneous determinationof the 2H/1H, 17O/16O, and 18O/16O isotope abundance ratios in waterby means of laser spectrometry. Anal Chem. 1999;71(23):5297‐5303.
8. Lis G, Wassenaar LI, Hendry MJ. High‐precision laser spectroscopyD/H and 18O/16O measurements of microliter natural water samples.Anal Chem. 2008;80(1):287‐293. https://doi.org/10.1021/ac701716q
9. Berman ES, Levin NE, Landais A, Li S, OwanoT. Measurement of δ18O,δ17O, and 17O‐excess in water by off‐axis integrated cavity outputspectroscopy and isotope ratio mass spectrometry. Anal Chem.2013;85(21):10392‐10398. https://doi.org/10.1021/ac402366t
10. Brand WA, Geilmann H, Crosson ER, Rella CW. Cavity ring‐down spec-troscopy versus high‐temperature conversion isotope ratio massspectrometry; a case study on δ2H and δ18O of pure water samplesand alcohol/water mixtures. Rapid Commun Mass Spectrom.2009;23(12):1879‐1884. https://doi.org/10.1002/rcm.4083
11. West AG, Goldsmith GR, Brooks PD, Dawson TE. Discrepanciesbetween isotope ratio infrared spectroscopy and isotope ratio massspectrometry for the stable isotope analysis of plant and soil waters.Rapid Commun Mass Spectrom. 2010;24(14):1948‐1954.
12. Hendry MJ, Richman B, Wassenaar LI. Correcting for methaneinterferences on d2H and d18O measurements in pore water usingH2Oliquid‐H2Ovapor equilibration laser spectroscopy. Anal Chem.2011;83(14):5789‐5796. https://doi.org/10.1021/ac201341p
13. Wassenaar LI, CoplenTB, Aggarwal PK. Approaches for achieving long‐term accuracy and precision of δ18O and δ2H for waters analyzed usinglaser absorption spectrometers. Environ Sci Technol. 2014;48(2):1123‐1131. https://doi.org/10.1021/es403354n
14. Wassenaar LI, Ahmad M, Aggarwal P, et al. Worldwide proficiencytest for routine analysis of δ2H and δ18O in water by isotope‐ratiomass spectrometry and laser absorption spectroscopy. Rapid CommunMass Spectrom. 2012;26(15):1641‐1648. https://doi.org/10.1002/rcm.6270
15. Available: http://www‐naweb.iaea.org/napc/ih/index.html.
16. www.redseafish.com.
17. ISO/IEC. Statistical Methods for Use in Proficiency Testing byInterlaboratory Comparison; 2015.
18. Skrzypek G. Normalization procedures and reference material selec-tion in stable HCNOS isotope analyses: an overview. Anal BioanalChem. 2013;405(9):2815‐2823. https://doi.org/10.1007/s00216‐012‐6517‐2
19. Lorenz JM, Qi H, Coplen TB. Antarctic Ice‐Core Water (USGS49) – Anew isotopic reference material for δ2H and δ18O measurements ofwater. Geostand Geoanal Res. 2017;41:63‐68. https://doi.org/10.1111/ggr.12135
20. Pomme S, Keightley J. Determination of a reference value and its uncer-tainty through a power‐moderated mean. Metrologia. 2015;52(3):S200‐S212. https://doi.org/10.1088/0026‐1394/52/3/S200
21. Dansgaard W, Johnsen SJ, Clausen HB, et al. Evidence for generalinstability of past climate from a 250‐Kyr ice‐core record. Nature.1993;364(6434):218‐220. https://doi.org/10.1038/364218a0
22. Gkinis V, Popp TJ, Johnsen SJ, Blunier TA. Continuous stream flashevaporator for the calibration of an IR cavity ring‐down spectrometerfor the isotopic analysis of water. Isotopes Environ Health Stud.2010;46(4):463‐475. https://doi.org/10.1080/10256016.2010.538052
23. Paul D, Skrzypek G, Forizs I. Normalization of measured stable isotopiccompositions to isotope reference scales – a review. Rapid Commun
406 WASSENAAR ET AL.
Mass Spectrom. 2007;21(18):3006‐3014. https://doi.org/10.1002/rcm.3185
24. Shirono K, Iwase K, Okazaki H, et al. A study on the utilization of theYouden plot to evaluate proficiency test results. Accredit QualAssurance. 2013;18(3):161‐174. https://doi.org/10.1007/s00769‐013‐0978‐7
25. Youden WJ. Graphical diagnosis of interlaboratory test results. Ind QualControl. 1959;15:24‐28.
26. Sofer Z, Gat JR. Activities and concentrations of 18O in concentratedaqueous salt solutions – Analytical and geophysical implications. EarthPlanet Sci Lett. 1972;15(3):232‐238. https://doi.org/10.1016/0012‐821x(72)90168‐9
27. Benetti M, Sveinbjörnsdóttir AE, Ólafsdóttir R, et al. Inter‐comparisonof salt effect correction for δ18O and δ2H measurements in seawaterby CRDS and IRMS using the gas‐H2O equilibration method.2017;194(Supplement C):114‐Mar Chem, 123. https://doi.org/10.1016/j.marchem.2017.05.010
28. Skrzypek G, Ford D. Stable isotope analysis of saline water samples ona cavity ring‐down spectroscopy instrument. Environ Sci Technol.2014;48(5):2827‐2834. https://doi.org/10.1021/es4049412
29. Koehler G, Wassenaar LI, Hendry J. Measurement of stable isotopeactivities in saline aqueous solutions using optical spectroscopymethods. Isotopes Environ Health Stud. 2013;49(3):378‐386. https://doi.org/10.1080/10256016.2013.815183
30. Epstein S, Mayeda T. Variation of 18O content of waters from naturalsources. Geochim Cosmochim Acta. 1953;4:213‐224.
31. van Geldern R, Barth JAC. Optimization of instrument setup andpost‐run corrections for oxygen and hydrogen stable isotope
measurements of water by isotope ratio infrared spectroscopy (IRIS).Limnol Oceanogr: Methods. 2012;10:1024‐1036. https://doi.org/10.4319/lom.2012.10.1024
32. West A, Goldsmith G, Matimati I, DawsonT. Spectral analysis softwareimproves confidence in plant and soil water stable isotope analysesperformed by isotope ratio infrared spectroscopy (IRIS). Rapid CommunMass Spectrom. 2011;25(16):2268‐2274.
33. Kuselman I, Kardash E, Bashkansky E, et al. House‐of‐security approachto measurement in analytical chemistry: quantification of human errorusing expert judgments. Accredit Qual Assurance. 2013;18(6):459‐467.https://doi.org/10.1007/s00769‐013‐1020‐9
SUPPORTING INFORMATION
Additional Supporting Information may be found online in the
supporting information tab for this article.
How to cite this article: Wassenaar LI, Terzer‐Wassmuth S,
Douence C, Araguas‐Araguas L, Aggarwal PK, Coplen TB.
Seeking excellence: An evaluation of 235 international
laboratories conducting water isotope analyses by isotope‐
ratio and laser‐absorption spectrometry. Rapid Commun
Mass Spectrom. 2018;32:393‐406. https://doi.org/10.1002/
rcm.8052