Seeking excellence: An evaluation of 235 international .../menu/standar… · compositions of water has evolved over six decades, and is classified into three types: (i) ... entrained

Received: 13 November 2017 Revised: 15 December 2017 Accepted: 18 December 2017

DOI: 10.1002/rcm.8052

R E S E A R CH AR T I C L E

Seeking excellence: An evaluation of 235 internationallaboratories conducting water isotope analyses by isotope‐ratioand laser‐absorption spectrometry

L.I. Wassenaar1 | S. Terzer‐Wassmuth1 | C. Douence1 | L. Araguas‐Araguas1 |

P.K. Aggarwal1 | T.B. Coplen2

1 International Atomic Energy Agency, Isotope

Hydrology Section, PO Box 100, A‐1400Vienna, Austria

2US Geological Survey, 431 National Center,

12201Sunrise Valley Drive, Reston, VA 20192,

USA

Correspondence

L.I. Wassenaar, International Atomic Energy

Agency, Isotope Hydrology Section, PO Box

100, A‐1400 Vienna, Austria.

Email: [email protected]

Rapid Commun Mass Spectrom. 2018;32:393–406.

Rationale: Water stable isotope ratios (δ2H and δ18O values) are widely used tracers in

environmental studies; hence, accurate and precise assays are required for providing sound

scientific information. We tested the analytical performance of 235 international laboratories

conducting water isotope analyses using dual‐inlet and continuous‐flow isotope ratio mass

spectrometers and laser spectrometers through a water isotope inter‐comparison test.

Methods: Eight test water samples were distributed by the IAEA to international stable

isotope laboratories. These consisted of a core set of five samples spanning the common

δ‐range of natural waters, and three optional samples (highly depleted, enriched, and saline).

The fifth core sample contained unrevealed trace methanol to assess analyst vigilance to the

impact of organic contamination on water isotopic measurements made by all instrument

technologies.

Results: For the core and optional samples ~73 % of laboratories gave acceptable results

within 0.2 ‰ and 1.5 ‰ of the reference values for δ18O and δ2H, respectively; ~27 %

produced unacceptable results. Top performance for δ18O values was dominated by dual‐inlet

IRMS laboratories; top performance for δ2H values was led by laser spectrometer laboratories.

Continuous‐flow instruments yielded comparatively intermediate results. Trace methanol

contamination of water resulted in extreme outlier δ‐values for laser instruments, but also

affected reactor‐based continuous‐flow IRMS systems; however, dual‐inlet IRMS δ‐values

were unaffected.

Conclusions: Analysis of the laboratory results and their metadata suggested inaccurate or

imprecise performance stemmed mainly from skill‐ and knowledge‐based errors including:

calculation mistakes, inappropriate or compromised laboratory calibration standards, poorly

performing instrumentation, lack of vigilance to contamination, or inattention to unreasonable

isotopic outcomes. To counteract common errors, we recommend that laboratories include

1–2 'known' control standards in all autoruns; laser laboratories should screen each autorun

for spectral contamination; and all laboratories should evaluate whether derived d‐excess

values are realistic when both isotope ratios are measured. Combined, these data evaluation

strategies should immediately inform the laboratory about fundamental mistakes or

compromised samples.

1 | INTRODUCTION

The stable‐hydrogen and ‐oxygen isotope (δ2H and δ18O values)

composition of environmental waters is an important assay in

diverse hydrogeologic, meteorological, watershed, oceanographic and

ecological studies around the world.1-3 Accurate and precise isotopic

wileyonlinelibrary.com

measurements of environmental waters are imperative in providing

rigorously sound scientific interpretations and hydrological assessments.

The instrumentation used for measuring the H and O stable isotopic

compositions of water has evolved over six decades, and is classified

into three types: (i) dual‐inlet isotope‐ratio mass spectrometry

(DI‐IRMS; 1950s‐present), (ii) continuous‐flow isotope‐ratio mass

Copyright © 2018 John Wiley & Sons, Ltd./journal/rcm 393

http://orcid.org/0000-0001-5532-0771

mailto:[email protected]

https://doi.org/10.1002/rcm.8052

http://wileyonlinelibrary.com/journal/rcm

394 WASSENAAR ET AL.

spectrometry (CF‐IRMS; 1990s‐present), and (ii) laser‐absorption

spectrometry (LAS; 2010s‐present). Increasingly, scientists are using

laser spectrometers due to lower capital cost and consumable

demands, ease of use, and ongoing improvements in analytical

precision.4

Isotope‐ratio mass spectrometers (DI‐ or CF‐IRMS instruments)

do not measure liquid water H and O isotopologues directly, nor

simultaneously; these assays are typically conducted for single

isotope ratios separately (2H/1H or 18O/16O) utilizing on‐ or off‐line

preparation devices to convert sample H2O into a pure analyte gas

(H2, CO2, CO) by gas–liquid equilibration 'activity' methods such as

CO2/H2–H2O equilibration (δ2H and δ18O values), thermochemical

decompositional methods such as zinc, chromium, manganese, and

uranium reduction (δ2H values), glassy carbon (δ2H and δ18O values)

or guanidine hydrochloride (δ18O values) (reviewed in 5,6). The sample

H2 or CO2/CO gas is introduced into the isotope‐ratio mass

spectrometer as a cryogenically purified sample gas or as a gas mix

entrained in helium carrier that is purified and separated by capillary

or packed column gas chromatography and the isotope ratios are

determined by CF‐IRMS. The longstanding CO2– and H2–H2O

equilibration method by DI‐IRMS is generally considered the

"gold standard" method for reliable high‐precision water isotopic

measurements, but is the costliest analytical option.

Unlike the IRMS systems, new lower cost technology laser

spectrometers measure all the key isotopologue species (1H216O,

1H2H16O, 1H218O, 1H2

17O) concentrations directly on injections of

(vaporized) H2O sample gas using infrared laser absorption

spectrometry, conducted either in static mode (Off‐Axis Integrated

Cavity Output Spectroscopy) or in flowing mode (Cavity Ring Down

Spectroscopy).4,7-9 Laser spectrometers produce accurate and precise

results provided that the water samples do not contain volatile organic

compounds (VOCs), hydrocarbons, or chemicals that can cause

detrimental spectral interferences.10-12 Furthermore, because water

vapor as an analyte gas is 'sticky', LAS instruments exhibit strong

between‐sample memory effects that must be corrected for.13

Regardless of which instrumentation technology laboratories

choose for water stable isotopic determinations, accurate and precise

measurements require a well‐maintained isotope‐ratio mass or laser

spectrometer and sample preparation peripheral(s), in addition to an

appropriate range of laboratory standards carefully calibrated to the

VSMOW‐SLAP primary reference scale.13,14

This IAEA Water Isotope Inter‐Comparison (WICO2016)15 tested

the performance of 235 international laboratories routinely conducting

water stable‐isotope measurements. Participating laboratories used a

diverse range of dual‐inlet and continuous‐flow IRMS instruments,

and LAS spectrometers and a wide range of sample preparation

peripherals and operational practices. A core set of five isotopically

contrasting water samples was distributed to international

laboratories,14 with three optional samples to accommodate those

laboratories working in cold regions, saline systems, or low latitude

or evaporative systems. Here we synthesize the results of the

WICO2016 inter‐comparison, with the aim of providing a review

of current analytical practices, identifying key problem areas, and

suggesting potential improvements to better enable laboratories to

improve their analytical accuracy and precision outcomes.

2 | EXPERIMENTAL

2.1 | Determining testing needs

Before WICO2016 implementation, a survey was conducted in

October 2015 asking laboratories what kind of water samples

would be appropriate for a contemporary water stable isotope inter‐

comparison. Of 219 laboratories from 63 countries that responded,

94 % recommended a core set (δ18O values from 0 to −20 ‰), 41 %

recommended an 18O‐enriched water (δ18O >3–5 ‰), 38 %

recommended a highly 18O‐depleted water (δ18O < −30 ‰), and 11 %

suggested sea or mineralized water with >30 g/L total dissolved solids.

Ten laboratories recommended specialty water samples such as atom

% enrichments (e.g. doubly labelled water), plant or biological water

extracts, alcohol distillates, or waters containing pollutants. Four

laboratories suggested samples for ultra‐high‐precision 17O‐excess

determination. These responses informed our sample mix that

accommodated a majority of the expressed interests. Online

registration for WICO2016 was conducted from December 2015 to

February 2016. A total of 266 laboratories from 69 countries

registered for participation in the testing, with the understanding that

individual laboratory evaluations are kept confidential, but that results

could be used anonymously in this synthesis report.

2.2 | Test samples: Origin and storage

Eight testwater sampleswere prepared forWICO2016. All were sourced

from natural waters, but several were adjusted using aliquots of 99.9 %

D2O and 98 % H218O to achieve specific δ‐value targets, while

concurrently maintaining 'usual' ranges of deuterium‐excess (d‐excess)

values ranging from 0 to +10 ‰; the δ17O value was not considered

for this test. The WICO2016 test waters, their sources, and stable

isotopic data are tabulated inTable 1. The test samples were divided into

a core set of five samples, plus three optional samples. The core set

spanned the most common range of δ2H and δ18O values encountered

by isotope laboratories worldwide (δ18O values from ~0 to −22 ‰,

roughly spanning the range between the VSMOW and GISP reference

materials). The optional samples included a depleted Greenland Fern

melt, a heavy‐isotope‐enriched sample, and a synthetic seawater

(Table 1). The synthetic seawater sample was made by dissolving 30.0 g

of aquarium‐grade Red Sea Salt™ (mainly NaCl) per liter of distilled water

(~0.7 molal NaCl). The seawater mix contained ~350 mg/L calcium,

~1000 mg/L magnesium and ~320 mg/L potassium.16

The fifth core sample was contaminated with trace methanol

(15.6 nmol/L CH3OH, 99.9%purity) to determinewhether LAS analysts

were vigilant in monitoring for and detecting spectral interferences, as

well as to assess the impact of trace VOC 'pollutants' on LAS and IRMS

isotope analyses.10,11 Accordingly, the contamination of the WICO5

sample was not revealed to the participants. Preliminary testing of

WICO5 at the IAEA (Vienna, Austria) using 2140 (Picarro, Santa Clara,

CA, USA), DLT‐100 (Los Gatos Research, San Jose, CA, USA) and

TIWA‐24d (Los Gatos Research) laser analyzers and laser spectral

evaluation software (Picarro ChemCorrect™ and Los Gatos Research

Spectral Contamination Identifier™) indicated that sample WICO5 was

lightly but sufficiently contaminated with methanol to cause highly

TABLE 1 WICO2016 samples and reference values for δ18OVSMOW and δ2HVSMOW. The . δ18O values were determined by CO2‐H2O

equilibrationa‐d using dual‐inlet isotope‐ratio mass spectrometry. The δ2

H values were determined by H2‐H2O equilibrationa‐c using dual‐inletisotope‐ratio mass spectrometry. Samples were normalized to the VSMOW using two‐point calibration with VSMOW/2a‐d and SLAP1/SLAP2a,b,d/USGS49c standard waters. Reference values are the power‐moderated mean ± robust standard deviation. The WICO5 results were determined onan uncontaminated subsample*

Testsample Water source

Conductivity(microS/cm)

δ18OVSMOW

(‰)SD(‰) N

δ2HVSMOW

(‰)SD(‰) N

d‐excess(‰)

Core Set

WICO1 Danube River water, Austria, filtered 560 −10.80 0.02 25 −77.5 0.6 25 8.9±0.6

WICO2 Neusiedler See (lake water), Austria, filtered 1430 −5.11 0.02 25 −41.9 0.8 27 −1.0±0.6

WICO3 Bow River watere, Canada, filtered 235 −22.01 0.04 25 −168.5 0.8 26 7.6±0.8

WICO4 Ground water mix, Egypt, Austria, filtered,distilled

10 −0.50 0.04 25 +0.4 0.8 26 4.4±0.8

WICO5* Vienna tap water and WICO6 mix, researchgrade methanol was added gravimetricallyto produce a 0.05 % methanol/watervolumetric ratio

290 −15.68 0.02 23 −114.5 0.6 25 10.9±0.6

Optional Set

WICO6 Greenland firn meltf, unfiltered 40 −41.41 0.02 25 −323.8 0.8 26 7.5±0.8

WICO7 Vienna tap water mixed with 99.99 % D2O and96.05 % H2

18O to ensure a usual d‐excessand isotopically positive result

385 +5.61 0.04 26 +55.2 0.8 26 10.3±0.8

WICO8 Synthetic seawater (30 g/L Red Sea salt), mixedwith deionized water and WICO 6 toproduce negative values with a usuald‐excess, unfiltered

~53000 −3.45 0.06 28 −17.5 0.8 27 10.1±0.8

N = total number of DI‐IRMS analyses. The derived d‐excess = δ2HVSMOW – 8 × δ18OVSMOW

aIsotope Hydrology Laboratory, IAEA, Vienna, Austria.bGEOTOP, University of Quebec, Montreal, Canada (J.‐F. Hélie).cU.S. Geological Survey, Reston, VA, USA. (T.B. Coplen).dUniversity of Groningen, Groningen, The Netherlands. (H.A.J. Meijer).eCourtesy of B. Mayer (University of Calgary, Canada).fCourtesy of T. Blunier (University of Copenhagen, Denmark).

*Participants received a contaminated WICO5 sample which contained 0.05 % methanol.

WASSENAAR ET AL. 395

divergent δ2H and δ18O results compared with the uncontaminated

sample. Testing at IAEA further showed no O or H isotopic difference

between the contaminated and uncontaminated WICO5 when

analyzed by CO2– and H2–H2O equilibration and DI‐IRMS.

The WICO2016 test waters were stored in 20‐L stainless steel

drums at room temperature under 0.5 bar N2 gas pressure in order

to prevent evaporation. Each drum had a siphon and valve system that

allowed water to be withdrawn without opening the drum. Samples

were dispensed into 30‐mL amber glass bottles with conical insert PP

screw caps (Etivera, St. Margarethen an der Raab, Austria); each WICO

sample dispensing was conducted in a one session. All 30‐mL test

bottles were labelled with the sample name and filling sequence

number. Afterwards, random testing of seven bottles of each WICO

sample by LAS and DI‐IRMS at the IAEA verified the isotopic

homogeneity of the WICO samples. One set of the final WICO

samples was measured by four reference laboratories.

2.3 | Establishing reference δ‐values

The reference δ18OVSMOW and δ2HVSMOW values of the WICO2016

samples were established by the consensus of expert laboratory

approach, according to ISO 13528:2015,17 with the condition that

each reference laboratory conducted analyses by DI‐IRMS using

certified isotopic reference materials and two‐point normalization.18

The expert consensus approach was required because primary

reference materials are limited and inappropriate for inter‐comparison

purposes. The δ18O and δ2H reference values of WICO2016 samples

were established by four expert laboratories (USGS, Reston, VA, USA;

GEOTOP, Montreal, Canada; CIR, Groningen, The Netherlands; and

the IAEA Isotope Hydrology Laboratory, Vienna, Austria). The δ18O

values were determined by CO2–H2O equilibration and DI‐IRMS at

the reference laboratories. The δ2H values were determined by H2–

H2O equilibration and DI‐IRMS at three of the reference laboratories.

All samples were normalized to the VSMOW‐SLAP scale by using

VSMOW/2 and SLAP (IAEA) or SLAP2 (GEOTOP, CIR, USGS) or

USGS4919 (USGS) reference waters measured along with the WICO

samples. Following normality and outlier testing, the reference values

were determined as the power‐moderated mean and robust standard

deviation, which took into account the uncertainties reported by the

reference laboratories.20 For the WICO5 sample, reference value

measurements were conducted on an uncontaminated sub‐sample.

The deuterium‐excess (d‐excess) for each test sample was

derived as d‐excess = δ2HVSMOW – 8 × δ18OVSMOW. The standard

uncertainties (ux) of the reference lab results were determined by:


ux ¼ 1:25×Robust Std:Dev:ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffinumber

pof assays

(1)

In all cases, ux was ≤0.3 × the standard deviation for proficiency

assessment (SDPA); hence, the uncertainty of the assigned δ‐values

was considered negligible for the purpose of this inter‐comparison

testing, and did not require the alternative use of z'(prime)‐scores. All

reference values were rounded to two (δ18O values) or one (δ2H

values) decimal place, and they are summarized in Table 1. The

reference δ‐values of the WICO2016 samples were not revealed to

participants beforehand.

2.4 | Shipping and data collection

The WICO2016 samples were shipped to participating laboratories in

March 2016, and all laboratories agreed to submit results by June 30,

2016. The optional WICO samples were selected by a smaller cohort

of laboratories appropriate to their analytical interests. Participants

were invited to measure WICO samples at least twice using the

method(s), instrumentation, and calibration standards considered

routine for their laboratory and to normalize and report their results

relative to the VSMOW‐SLAP scale. Information returned to the IAEA

included the final δ2HVSMOW and/or δ18OVSMOW values for each test

water, basic statistics (n, uncertainty), and operational metadata on

preparation methods used (CO2/H2–H2O equilibration, reduction

methods, etc.), the instrumentation used for isotopic measurement

(IRMS, LAS, etc.) and laboratory practices (laboratory standards,

injections, data processing, etc.).

After the June deadline, 31 laboratories requested an extension.

Twenty laboratories submitted late results (not included in this

synthesis), and the remaining 11 laboratories did not submit results.

Some laboratories had multiple instruments (lasers and mass

spectrometers); therefore, each instrumental result was considered

an individual submission so that results from different types of

instruments in one laboratory were not being averaged. Not all

laboratories provided the requested metadata. In total, there were up

to 267 submissions (from 235 reporting laboratories) for the H and O

isotope evaluation synthesis. Confidential laboratory assessment

reports were provided to each participant in July 2016.

2.5 | Statistical tests

Determination of unacceptable outliers from the submitted results

was obtained by calculating the upper (HU) and lower (HL)

quartiles for each test water and reported isotope ratios (H or O)

for all the laboratories, as well as the corresponding interquartile

range (HU–HL). Values exceeding 1.5‐times the interquartile range

HU + 1.5·(HU–HL) and HL–1.5·(HU–HL) were denoted as

unacceptable outliers. The median, quartile limits and outliers were

illustrated graphically by using Tukey box‐and‐whisker plots for

each sample and each isotope ratio.

The standard deviation for proficiency assessment was the

maximum acceptable difference between the assigned reference

δ‐value and the laboratory‐submitted δ‐value for each sample. As

deemed appropriate for hydrological applications, the SDPA was set

to 0.2‰ for δ18O values and to 1.5 ‰ for δ2H values (compared with

2 ‰ for δ2H values in WICO2011); both were considered reasonable

criteria that encompassed a diverse range of contemporary

preparative technologies, analytical instrument types, and instrumental

specifications. We acknowledge that a more restrictive SDPA

(e.g. 0.05 for δ18O values) would be required for laboratories engaged,

for example, in high‐resolution paleoclimate reconstructions from ice

core isotopic data,21,22 but such stringent testing was beyond the

scope of WICO2016 and not as broadly applicable.

2.6 | z‐scores

An unacceptable z‐score for a laboratory‐reported result was a δ18O

or δ2H value that deviated from the reference δ‐value by more than

4 times the SDPA. The z‐score for each submitted sample isotopic

(H or O) result was determined by:

z ¼ P−ASDPA

(2)

where P was the participant‐reported δ‐value for each WICO

sample, A was the assigned δ‐value, and the SDPA was as defined

above. The following interpretations were given to z‐score results:

∣z∣≤2:00 Acceptable=Fit for Purpose

2:00<∣z∣<3:00 Questionable

∣z∣≥3:00 Unacceptable

A sample z‐score of 0 implies a perfect match (no bias) from the

reference δ‐value. Individual z‐scores within each of the categories

should not be over‐interpreted; for example, a z‐score of 0.2 is not

statistically 'better' than 1.6; both are acceptable and fit‐for‐purpose

as defined.

2.7 | En‐scores

A subset of the laboratories yielding acceptable z‐scores test for core

samples WICO1–4 was further scrutinized by taking into account the

reported uncertainty for each submitted WICO sample. The En‐score

indicated whether a reported result was fit‐for‐purpose by taking

into account the uncertainty reported by the laboratory and the

uncertainty of the assigned reference value (Table 1). No attempt

was made to assess how uncertainty was established by each

laboratory due to the variety of ways in which laboratories

determined it (Table 2); hence, the reported laboratory uncertainty

was accepted at face value. An En‐score was calculated for each

sample and stable isotope ratio using:

En ¼ x−Lð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiU2x þ U2

L

q (3)

where x was the laboratory‐reported δ‐value for each sample, L was

the reference δ‐value, Ux was the uncertainty reported by the

laboratory, and UL was the uncertainty of the reference value. An

En value of ≤ |1| was considered acceptable, whereas an En‐value of

> |1| was unacceptable. En‐scores closer to 0 represent more accurate

and precise results.

TABLE 2 Survey of participant laboratory practices: calibration standard sources, storage, calibration frequency, data normalization methods, anduncertainty methods. Results of the post‐WICO survey summarizing self‐identified reasons for poor performance

Source of laboratory calibration standards (n=266) Data normalization method (n=239)

Made locally 65 % 3‐point (or more) standard calibration 51 %

IAEA (VSMOW2/SLAP2) 6 % 2‐point standard calibration 44 %

Instrument Manufacturer 9 % 1‐point standard calibration 4 %

U.S. Geological Survey 6 %

A colleague 6 % Measurement uncertainty reporting method

NIST 4 % 1 Sigma SD of Control Standard 76 %

Scientific Supplier 4 % 1 Sigma SD of Repeated Samples 6 %

Assign Blanket Uncertainty Value 5 %

Storage of laboratory calibration standards (n=235) Full Error Propagation 5 %

Small Glass Bottles (< 1 L) 46 % 2 Sigma SD of Control Std 2 %

Flame Sealed Glass Ampules 15 % Uncertainty not determined 1 %

Large Glass Bottles (1‐10 L) 14 % RMSE of Control Std over time 1 %

Large Steel or Aluminium (>10 L) 15 % SEM of Control Std over time < 1 %

Small Plastic Bottles 2 % Worst 1 Sigma SD of repeats < 1 %

Large plastic (>10 L) 3 %

Other… 5 % Post WICO Survey of Problems (n=66)(multiple issues per lab sum >100 %)

(Re)Calibration Frequency using VSMOW2/SLAP2 (n=196) Incorrect data normalization 33%

1 Year 44 % Insufficient lab std. δ‐scale coverage 32 %

2 Years 20 % Instrumental Problems 31 %

3 Years 25 % Problem not yet identified 23 %

Never 7 % Compromised laboratory Standards 15 %

Don't know 4 %

TABLE 3 Scoring schema to assess overall laboratory accuracy for

core samples WICO1–4. Scoring criteria were based on instrumenttechnology specifications and expert expectations. Out of 12 pointsper isotope ratio, the scoring brackets of 12–10, 9–7, 6–4 and ≤3 weredeemed as excellent, acceptable, questionable and unacceptable,respectively. Deviation in ‰ is the absolute δ‐difference between thereference and measured δ‐values. Score per isotope ratio is the sum ofearned points

PointsLaserspectrometry

Continuous‐flowIRMS

Dual‐inletIRMS

δ2HVSMOW Deviation in ‰ Deviation in ‰ Deviation in ‰

3 ≤0.5 ≤0.5 ≤1.0

2 ≤1.0 ≤1.0 ≤1.5

1 ≤2.0 ≤2.0 ≤3.0

0 >2.0 >2.0 >3.0

δ18OVSMOW

3 ≤0.1 ≤0.1 ≤0.05


2.8 | Cumulative rankings

We ranked the collective accuracy for both O and H isotope ratios

using point scores for the core WICO1–4 set (Table 3), as well as

cumulative offsets. The collective scores yielded an overall picture of

laboratory accuracy performance for a diverse range of sample isotopic

compositions, rather than that obtained by focusing on individual

WICO sample z‐score results.13 This point scoring furthermore took

into consideration instrumental specifications and expectations

(Table 3). More accurate collective values resulted in higher scores.

Out of 12 points per isotope ratio, the scoring brackets of 12‐10,

9‐7, 6‐4 and ≤3 were deemed as excellent, acceptable, questionable

and unacceptable, respectively. No attempt was made to incorporate

measurement uncertainty into the collective point scores. We ranked

laboratories by their cumulative offset from the reference δ‐values

for the WICO1–4 core set of samples by O and H isotope ratios and

by measurement technology.
2 ≤0.2 ≤0.2 ≤0.1
1 ≤0.3 ≤0.3 ≤0.2

0 >0.3 >0.3 >0.2

Scores: 12–10 = Excellent; 9–7 = Acceptable; 6–4 = Questionable; 0–3 =Unacceptable

3 | RESULTS AND DISCUSSION

The participation rate in WICO2016 was exceptional: for the core set

(WICO1–5), 267 results were submitted for δ18O values of which 157

(59 %) were LAS analyses, 83 (31 %) were by CF‐IRMS and 27 (10 %)

were by DI‐IRMS. For δ2H values, there were 259 returns of which

158 (61 %) were LAS analyses, 72 (28 %) were by CF‐IRMS and 29

(11 %) were by DI‐IRMS. For the optional samples, there were 123

and 118 returns for WICO6, 147 and 141 for WICO7 and 96 and 91

returns for WICO8 for δ18O and δ2H values, respectively. These

returns represented a ~75 % increase in participation over


WICO2011,14 and a ~210 % increase over WICO 2002. A large part of

the increase was due to the proliferation of LAS analyzers since around

2009; LAS comprised 0 % of participants in WICO2002, 45 % in

WICO2011, and 59 % in the current test. The number of IRMS

participants increased by 100 % over WICO2011 from 58 to 116

instruments. All (anonymized) δ‐value results for WICO2016 samples

were tabulated and are reported in Table S1 (supporting information).

By technology, most LAS instrument laboratories employed a

liquid autosampler and a water sample vaporizer; however, two

laboratories conducted manual injections and two employed custom‐

made vapor streams. For DI‐IRMS for δ18O values, all laboratories used

an isothermal CO2–H2O equilibration apparatus (1–12 h, 15–40°C)

with or without agitation. For DI‐IRMS for δ2H values, 68 % used

isothermal H2–H2O equilibration (1–12 h, 12–40°C) with or without

agitation, 4 % used Cr reduction and one used Zn reduction. For

CF‐IRMS for δ18O values, 70 % used static isothermal CO2–H2O

equilibration (3–36 h, 20–50°C) and headspace gas sampling, and 30 %

used directwater injectionswith high‐temperature carbon (HTC) reduction

to CO gas. For CF‐IRMS for δ2H values, 43 % used online HTC reduction

to H2 gas, 42 % used static H2–H2O equilibration and headspace sampling

(1–72 h, 20–50°C), and 15 % used online Cr reduction to H2 gas.

An overview of the operational practices of participating

laboratories is tabulated inTable 2. For daily‐use laboratory calibration

standards, most laboratories made their own in‐house calibration

standards (65 %) and stored them in glass bottles or sealed ampules

(90 %). The overall range of δ‐values for laboratory standards used in

the majority of international laboratories is plotted in Figure 1. There

were distinctive modalities in the laboratory calibration standards that

corresponded to typical 'bracketing ranges’, particularly near the

VSMOW and SLAP endmember δ‐values. The majority of laboratory

daily‐use calibration standards had δ18O values between +5 and −30 ‰.

Seven laboratories (results not plotted) employed daily‐use standards

with exceptionally high δ2H values, ranging from +100 to +1200 ‰,

and two laboratories reported using δ18O calibration standards with

high values of +51 and +117 ‰. A small proportion of laboratories

reported running WICO test samples against primary standard

reference materials VSMOW2 and SLAP2 (6 %), a practice generally

FIGURE 1 Range and modalities of laboratory calibration standards forlaboratories for their routine operations [Color figure can be viewed at wil

discouraged in routine operations. Most laboratories followed a

reasonable VSMOW2/SLAP2 laboratory standard (re)calibration cycle

between 1 and 3 years. Some laboratories indicated no laboratory

standard calibration efforts; these purchased or obtained pre‐calibrated

standards from the USGS, NIST or other outside sources.

More laboratories used 3‐or‐more point data normalization (51 %)

than 2‐point data normalization (44 %), and only 4 % of laboratories

used (not recommended) single‐point data normalization.23

Measurement uncertainty was generally reported as the 1‐sigma

standard deviation of a control standard over an unspecified period

of time or number of analyses (76 %); however, the survey revealed

a wide diversity of methods for expressing measurement uncertainty.

3.1 | Unacceptable outliers

Outlier testing revealed that a sizeable number of laboratories

produced discordant results for one or more samples, as depicted by

Tukey box‐and‐whisker plots and a δ2H vs δ18O cross‐plot for each

of the WICO test waters (Figures 2 and 3; WICO5 excluded, see

below). Overall, the median values of the reported WICO results

agreed well with their assigned values (Figure 2, Table 1); however,

the isotopic data were non‐normally distributed (Q‐Q plots, not

shown) due to the effect of the outlier laboratories. With outliers

removed, the population median values closely matched the WICO

reference values, as expected. Samples falling beyond 1.5‐times the

interquartile range (red circles in Figure 2) were deemed unacceptable

outliers. For δ2H values, of 1036 results submitted for the core set,

101 were deemed unacceptable outliers (10 %), with similar

percentages for each of the core sample. This was slightly higher than

the δ2H outlier rate in WICO2011 (6 %). For WICO δ2H samples, the

proportion of outliers was similar at 8 % (depleted), 13 % (enriched)

and 7 % (saline) for WICO 6, 7 and 8. For the δ18O samples, of 1068

submitted results for the core set, 136 were outliers (12.7 %) with similar

percentages for all core samples, and slightly higher than in WICO2011

(10.6 %). For the optional WICO δ18O samples, the proportion of

outliers was considerably higher at 20 % (depleted sample), 17 %

(enriched sample) and 11 % (saline sample) for WICO 6, 7 and 8.

oxygen and hydrogen isotope ratios reported by 235 internationaleyonlinelibrary.com]

http://wileyonlinelibrary.com

FIGURE 2 Box‐and‐whisker plots of the WICO core and optional test waters for δ2H (top) and δ18O (bottom) values. Median and interquartileranges are depicted, and the number of analyses per sample. Circles represent unacceptable outliers. Note: only the overall range and medianare depicted for contaminated WICO5 [Color figure can be viewed at wileyonlinelibrary.com]

FIGURE 3 Cross‐plot of submitted δ2H versus δ18O for WICO2016 core, optional, and contaminated samples (left). Cross‐plot of d‐excess versusδ18O for WICO core, optional, and contaminated sample (right). Black dots denote the reference values [Color figure can be viewed atwileyonlinelibrary.com]


Considering only the unacceptable outliers by instrument tech-

nology, 60 % of the δ18O core set outliers were LAS analyses, 40 %

were CF‐IRMS analyses, and <2 % were DI‐IRMS analyses. For the

optional δ18O samples, the unacceptable outlier proportions were

similar, with the exception of saline sample WICO 8 where 64 % of

outliers were LAS analyses. For δ2H samples, 47 % of outliers were

LAS analyses in the core set, 41 % were DI‐IRMS analyses, and

12 % were CF‐IRMS analyses. For the optional δ2H samples, the

unacceptable outlier proportions were similar, with the exception of

the saline test sample where the higher proportions (69 %) of outliers

were from LAS analyses.

One general observation (not depicted) was that roughly half of

laboratories that produced unacceptable outliers did so for the entire

set of WICO samples, regardless of which instrument technology was

used. This finding suggested that approximately 5–6 % of participating

laboratories gave unacceptably poor general performance for a variety




of possible reasons. Nevertheless, the largest proportion of outliers was

from LAS instrument laboratories, and considering their rapid growth

they probably represented a higher percentage of less‐experienced

laboratories. There were some striking contrasts in the core set of IRMS

outliers: DI‐IRMS laboratories held an unexpectedly large proportion of

unacceptable outliers for δ2H values (44 %) compared with δ18O values

(<2 %); conversely CF‐IRMS laboratories held a large proportion of

unacceptable outliers for δ18O values (40 %) compared with δ2H values

(12 %). Unlike WICO2011, there was no unacceptable outlier bias

towards positive or negative values for either H or O isotope ratios.

All unacceptable outlier results (e.g. z‐scores > |3|) were removed for

subsequent z‐score plots and En performance evaluations and

depictions. The contaminated core WICO 5 sample results are

discussed separately, below.

3.2 | Assessment of z‐scores

A graphical depiction of δ2H and δ18O z‐scores for the non‐outlier

WICO2016 samples plotted by measurement technology in Youden

dual‐isotope cross‐plots is shown in Figure 4.24,25 Laboratory results

plotting closer to the origin on both z‐score axes exhibited more

accurate performance for both isotope ratios. Samples falling along a

slope of 0.5 (45 degrees) into the upper right or lower left quadrants

exhibited systematic errors for both isotope ratios (e.g. from both

ratios measured on the same injection). A lack of H and O z‐score

correlation or results that plotted in the upper left and lower right

quadrants suggested that H and O performance outcomes are

analytically disconnected, either by separate assays or higher analytical

biases on one or other of the isotope ratios.

The vast majority of laboratories using DI‐IRMS instruments and

CO2– and H2–H2O equilibration produced fit‐for‐purpose z‐score

results for δ2H and δ18O values, with very few questionable results

FIGURE 4 Youden z‐score cross‐plots of δ2H and δ18O values for allFigures S1–S3 (supporting information) for cross‐plots by instrument tecviewed at wileyonlinelibrary.com]

(Figure 4; expanded in Figure S1, supporting information). These plots

revealed that δ18O analyses performed by DI‐IRMS laboratories using

CO2‐H2O equilibration were exceptionally good performers compared

with those using other technologies (tightly along the y‐axis);

however, there was far more variance in the DI‐IRMS δ2H z‐scores

(x‐axis), but still within the acceptable range (Figure S1, supporting

information). Because dual‐inlet O and H analyses are usually

conducted separately, no correlation was found between the DI‐IRMS

δ18O and δ2H z‐scores (r2 <0.01). The saline WICO8 sample exhibited

slightly negative δ2H biases compared with the reference value, but

the results were mostly acceptable.

For CF‐IRMS instrument laboratories, the Youden plots exhibited

mostly acceptable z‐scores for both isotope ratios, but with

comparatively larger scatter for both δ18O and δ2H values than with

DI‐IRMS (Figure 4; expanded in Figure S2, supporting information).

The saline WICO8 sample had a tighter range of δ18O z‐scores, but

again a slightly negative spread for δ2H values, akin to that observed

in DI‐IRMS. Similarly, no trend was found for comparative δ18O and

δ2H z‐score results (r2 <0.05), probably because most CF‐IRMS O

and H isotope measurements are also conducted separately.

For LAS water isotope analyses, the Youden z‐score plots

exhibited predominantly fit‐for‐purpose results with a spread of

z‐scores similar to CF‐IRMS (Figure 4; expanded in Figure S3,

supporting information). LAS laboratories had comparatively more

questionable or unacceptable z‐score results for δ2H and δ18O values

than the DI‐ and CF‐IRMS laboratories. Because LAS instruments

measure H and O isotopologues directly on the same water injections,

there were correlations (r2 = 0.11–0.38) between the O and H isotope

z‐scores as depicted by dashed regression lines in Figure 4 (Figure S3,

supporting information), an observation also noted in WICO2011.

Thus, unlike DI‐ or CF‐IRMS instrumentation, biased results from

LAS are more likely to affect both H and O isotopes proportionately.

laboratories (outliers removed) for all the WICO test waters. Seehnology type and expanded WICO5 axes [Color figure can be



As with the IRMS, the saline WICO8 LAS results exhibited a

slightly negative bias for δ2H values, but mostly within fit‐for‐purpose

limits. It is worth noting the WICO8 reference values for δ18O and δ2H

were established using CO2– and H2–H2O equilibration 'activity'

methods, which produce differing δ‐values from water with increasing

salt amount (and type), especially above seawater salinities (>35 g/L).26

Conversely, LAS analyses are direct water isotopologue 'concentration'

measurements, and in WICO2016 these results were compared with

activity‐based reference δ‐values. Recently, studies have shown

that the comparative δ‐value differences between activity and

concentration methods for seawater samples are < +0.1 ‰ for δ18O

values and ~ +1.5 ‰ for δ2H values at 35 g/L, and that they may

require a correction factor.27,28 The WICO8 (30 g/L) results confirmed

these experimental findings; the WICO8 population median

differences between DI‐IRMS activity‐based and laser measurements

were +0.05 ‰ (within SDPA) for δ18O values, but were +1.0 ‰ for

δ2H values. These results affirm that measurement of saline water

samples, particularly for water samples beyond 30 g/L, measured by

activity versus concentration methods will require an agreed

calibration to a common concentration or activity scale.27,29 This is

an area for more research, and possibly for future WICO testing.

3.3 | En‐Scores

For laboratories that reported uncertainties for the core WICO1–4

samples, the En‐scores (without outliers) were summarized in

histograms for δ2H and δ18O values (Figure 5), and categorized by both

isotope and instrument technology. The En‐scores for laboratories

were normally distributed around 0 to |1| (acceptable), with a few

FIGURE 5 En‐scores for δ18O (top) and δ2H (bottom) values classified byacceptable performance

outliers. In all cases, laboratories with En‐scores > |1|, regardless of

whether LAS or IRMS, fell into two categories: those laboratories with

accurate δ‐results but unrealistically reported low uncertainties (e.g.

±0.01 ‰ uncertainty for LAS δ18O values), and those with biased

results and uncertainties sufficient to plot outside acceptable En limits.

Median (max/min) reported measurement uncertainties for δ18O

values for the WICO1–4 core samples by all laboratories were ±0.09

(±0.8/±0.01) ‰, ±0.04 (±0.6/±0.01) ‰, and ±0.09 (±3.5/±0.01) ‰

for CF‐IRMS, DI‐IRMS and LAS, respectively. For δ2H values, the

median (max/min) measurement uncertainties were ±0.9 (±4.8/

±0.04) ‰, ±0.5 (±2.1/±0.1) ‰, and ±0.5 (±4.5/±0.1) ‰. Whereas

the median reported uncertainties appeared to be reasonable for

top performing laboratories, some laboratories reported uncertainties

that far exceeded the capabilities of the instrument technology, or

were unrealistically low when the uncertainty of the primary and

laboratory standards was factored in through basic error propagation

(e.g. one cannot report a lower uncertainty than that inherent in the

calibration standards used). For example, VSMOW2 has an uncertainty

of ±0.3 ‰ for δ2H values and the median LAS uncertainty for

WICO2016 was ±0.5 (Table 2 – as 1 sigma SD); hence, the

propagated uncertainty for VSMOW2 by LAS would be, on average,

±0.6 ‰ (e.g.

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi0:32 þ 0:52

� �r), which is 0.1 ‰ higher than the

median population uncertainty reported for LAS. Conversely, and in

particular for LAS analyses, some laboratories reported unrealistically

poor (high) uncertainties for each isotope ratio (e.g. ±4.5 ‰ for LAS

δ2H values) which suggested either that unacceptably poorly

performing instrumentation was used, or that insufficient care and

attention were taken to properly establish analytical uncertainty. In

measurement technology. The grey area encompasses the range of


short, as noted in all the previous WICO tests, the reporting of

analytical uncertainty for water stable isotope measurements

remains a perpetually inconsistent practice, and at the very least

the uncertainty should be determined by basic error propagation

methods using primary and secondary calibration standards and

control replicates along with clearly stated methodologies.

3.4 | Collective accuracy scoring and cumulativeoffset rankings

Collective accuracy scores for WICO1–4 core samples spanning the

most commonly measured isotopic range are plotted in Figure 6. The

overall top performers for δ18O values were DI‐IRMS laboratories:

93% of laboratories produced excellent to acceptable collective results,

followed by CF‐IRMS (74 %) and LAS (70 %). The outcome for δ2H

values was not as encouraging; top performers were the DI‐IRMS and

LAS laboratories (62 % and 61 % excellent to acceptable collective

scores, respectively), but fewer than half (49 %) of the CF‐IRMS

laboratories gave acceptable to excellent δ2H outcomes.

Collective accuracy performance also varied by isotope ratio and

by instrumentation type. Surprisingly, the highest proportion of poor

performance for δ2H values was measured by CF‐IRMS (51 %). In

general, with the exception of DI‐IRMS for δ18O values, and

considering all WICO core samples, between 26 and 38 % of

laboratories were unable to produce acceptable to excellent

collective results for the core suite of WICO test samples.

Considering the ranking of WICO1–4 outcomes by their

cumulative offsets from the reference values (expanded in Figures S5

and S6, supporting information) for δ18O and δ2H values, a similar

picture emerged. Approximately 73 % of the δ18O WICO submissions

gave cumulative offsets lower than 0.8 ‰ (4 × 0.2 SDPA), whereas

27 % had offsets higher than 0.3 ‰. The average population

FIGURE 6 Combined cumulative performance scores for the WICO1‐4 cviewed at wileyonlinelibrary.com]

cumulative offsets by technology performance were 0.3 ‰, 0.8 ‰

and 1.2 ‰ for DI‐IRMS, LAS and CF‐IRMS, respectively, again indicat-

ing that DI‐IRMS laboratories were top performers for δ18O values.

Similarly, for δ2H values, ~73 % had cumulative offsets lower than 6

‰ (4 × 1.5 SDPA), and ~27 % were higher. However, the top per-

formers for δ2H values, having the lowest cumulative offsets, were

dominated by LAS instruments, followed by DI‐ and then CF‐IRMS

instruments, with average offsets from the reference values of +0.5

‰, +0.6 ‰ and +0.9 ‰, respectively. In fact, 8 of the 10 best δ2H

ranked results were LAS laboratories (Figure S6, supporting

information).

These collective scores and cumulative offset rankings suggest

that DI‐IRMS δ18O analysis by CO2–H2O equilibration remains the

gold standard for top oxygen isotope ratio analytical performance,

and to a lesser extent H2–H2O equilibration for δ2H analyses probably

from a combination of the higher‐level technical expertise required to

operate DI‐IRMS instruments and a decades‐long methodological

consistency in gas‐water isotope equilibration protocols.30 On the

other hand, the top δ2H results were produced mostly by LAS

laboratories (Figure S6, supporting information). The top‐tier LAS

performers for δ2H results were evenly split between CRDS and

OA‐ICOS instruments. Overall, the worst performance for both O

and H isotope ratios was from CF‐IRMS laboratories. This suggests

that one strategy for obtaining best isotopic outcomes would be to

measure δ18O values by DI‐IRMS and δ2H values by LAS.

By contrast, for seemingly straightforward LAS instrumentation

there was remarkable (and unnecessary) inconsistency in the wide

range of basic operational and data processing approaches used by

these laboratories, as illustrated in Figure 7. It was unclear whether this

diversity of approaches was due to considerable new‐user inexperience,

or from attempts to improve LAS results by experimentation with

operational adjustments. The number of injections and ignores per

ore samples using the scoring criteria in Table 3 [Color figure can be


FIGURE 7 Summary of reported operational practices for participating LAS spectrometer laboratories


vial for LAS varied widely by laboratory, although most followed manu-

facturer suggestions of 6 or 8 injections and ignored the first 3–4 injec-

tions to help reduce memory effects. More problematic, however, was

that >55 % of LAS laboratories took no corrective actions to correct for

between‐sample memory or drift, and only 18 % of laboratories

checked samples for spectral contamination (see below), or for H2O

amount dependencies due to underperforming syringes. It seems likely

that LAS performance could be easily improved by laboratories using

the recommended manufacture operational guidelines and also by

adopting well‐established data corrections for LAS memory

corrections, drift and systematic template approaches.13,31

3.5 | Contaminated water sample WICO5

The results for WICO5 are plotted in Tukey box‐and‐whisker plots

and in dual‐isotope ratio space in Figures 2 and 3, and as Youden

z‐scores plots in Figure 4. These data revealed some remarkable

outcomes. First, the range of reported δ2H and δ18O values for WICO5

spanned an astonishing 146 ‰ and 55 ‰, respectively (Figures 3

and 4), with highly abnormal d‐excess values ranging from −80 to

+70 ‰. Examination of the z‐scores plots (Figure 4) showed that

outlier δ‐values forWICO5were almost all attributable to LAS analyses,

clearly a result of well‐known spectral interferences due to the trace

content of methanol. The z‐scores furthermore clustered by LAS

manufacturer and by instrument model. For example, Los Gatos and

the Picarro 2100 series instruments resulted in clustered positively

biased isotopic results. Conversely, older Picarro 1100‐2110 series

instruments resulted in clustered negatively biased results. A few

lasers (inexplicably) produced acceptable δ‐values for WICO5 without

any laboratory‐reported contamination. Only 18 % of laboratories

said they screened for spectral interferences (Figure 7); however, only

18 LAS laboratories (12 %) explicitly reported back that WICO5 was

contaminated, despite the extraordinary d‐excess values. Of all the

laboratories that reported organic contamination, two took additional

steps to correct their results. In one case, a Picarro laboratory used a

micro‐combustion module interface to combust the water and

organic contaminant; however, this approach produced a biased result

for WICO5. In the second case, an experienced Los Gatos user recog-

nized the spectral pattern as an alcohol, and constructed an alcohol

concentration correction algorithm which produced an accurate final

result for WICO5. These findings for WICO5 clearly demonstrate that

known (and unknown) trace VOC contamination has a very serious

impact on LAS‐based water H and O isotope analyses.11,12,32 For most

natural waters submitted to laboratories it will be impossible to predict

(or screen) whether interfering trace VOCs are present, or not. Two

strategies are recommended for LAS users to better screen for VOC‐

compromised samples. First, every autorun should be scrutinized using

the manufacturer‐provided spectral contamination software in routine

operation – this effort takes only a fewminutes. This will be particularly

important for waters that are suspected to contain VOCs (e.g. hydro-

carbon‐gas rich waters, landfill leachates, soil and plant waters). Sam-

ples identified as spectrally compromised should be repeated to verify

contamination (false positives are possible), or not reported.32 Second,

laboratories should derive the d‐excess of the measured water samples

as a potential proxy contamination indicator in case of failure of the

spectral software to identify compromised samples. Samples having

d‐excess values falling outside the expected (location specific) or the

common range of approximately −10 to +25 ‰ (e.g. Figure 3) for

non‐evaporated environmental waters may be flagged as 'suspect'

and should be repeated or verified using IRMS methods, if possible.

For CF‐IRMS, the outcomes for WICO5 were surprisingly poor, as

evident by a significant number of unacceptable H and O results

(Figure 3 and 4; Figure S2, supporting information). Whereas for

H2–H2O equilibration (n = 53) nearly all the δ 2H results were acceptable

(mean z‐score of −0.4), both the HTC reduction (n = 28) and the Cr

reduction (n = 11) methods yielded many biased δ2H results, having

mean z‐scores of −1.38 ± 5.1 and 2.62 ± 9.1, respectively. For CF‐IRMS

δ18O values by CO2–H2O equilibration (n = 53), most results were

mainly acceptable (mean z‐score of −0.2), but online HTC reduction

(n = 22) caused most of the unacceptable results and was negatively

biased (mean z‐score of −1.85 ± 3.8). For DI‐IRMS (Figure 4;

Figure S1, supporting information), there was no evidence that

methanol contamination affected either H or O isotope results, either

in the participants reported results or in separate testing at the IAEA.


In summary, trace VOC (e.g. methanol) contamination of water

seriously affected LAS measurements due to spectral interferences,

but also adversely affected the CF‐IRMS decompositional methods

(Cr, HTC reduction). In the latter case, the methanol is thermally

decomposed to H2 and CO with the water sample and, because

solvents generally have negative δ2H values, appeared to cause

negative biases. The CO2– or H2–H2O equilibration methods were

largely unaffected by the trace methanol; in the case of CF‐IRMS this

is probably due to GC separation of the sample H2 or CO2 analyte

gas from methanol, and in the case of DI‐IRMS to −80°C to −100°C

cold trapping of water and methanol vapor prior to expansion of the

sample gas into the sample bellows.

3.6 | d‐Excess results

One surprising result of the WICO2016 test was that the derived

d‐excess values of laboratory‐reported O and H results spanned nearly

the entire natural range for each of the test samples (Figure 2).

Expanded by instrument type (Figure S4, supporting information), the

best performing instruments for d‐excess for the WICO1–4 core

samples were DI‐IRMS, having the lowest range of reported d‐excess

values (SD ±1.6 ‰), followed by LAS (SD ±2.4 ‰) and CF‐IRMS (SD

±3.2 ‰). We hypothesized that LAS might yield improved d‐excess

determinations since measurements of O and H are made on the same

sample, but this was not the case; the spread of reported d‐excess

values was almost as large as that for CF‐IRMS instruments. However,

population median d‐excess offsets from the reference d‐excess values

were lowest for LAS, then CF‐IRMS and DI‐IRMS (−0.1, −0.2 and −0.3

‰). These data also suggest that the widespread use and interpretation

of d‐excess in hydrologic studies should be treatedwith caution unless it

can be demonstrated that laboratory performance for O isotope ratios

and H isotope ratios in particular is exceptionally accurate and precise.

3.7 | Laboratory performance over time

We recalculated the z‐scores of laboratories that took part in

WICO2011 (ensuring that the same instruments were used) to ensure

consistency with WICO2016 methodology. The results showed that

for repeat IRMS laboratories (n = 34) 29.4 % performed about the same

(z‐scoreswithin ±1 ofWICO2016), 29.4% improved (better z‐score, <1)

and 41.2% performedworse (z‐score >±2). For LAS laboratories (n = 30)

the results were similar: 13.3 % performed similarly, 33.3 % improved

and 53.3 % performed worse. In short, about half of the laboratories

that participated in WICO2011 (n = 30) did worse in WICO2016 than

in the previous test, but it should be noted that recurring laboratories

represent only ~11 % of the total WICO2016 participant pool.

3.8 | Factors causing underperformance

We further examined the metadata that many of the participating

laboratories provided (e.g. instrumental and operational practices,

standard δ‐ranges, etc.) and compared these data with their

performance outcomes to assess whether there were key factors that

contributed to excellent or poor performance outcomes. In other

words, did top performing laboratories exhibit practices that were

quantifiably different (better) than those of laboratories that

performed poorly? However, we were unable to distinguish any

operational factors from these metadata that could be clearly related

to good or poor performance. As one example, we compared the

collective scores and metadata of the best performing LAS instrument

laboratories (e.g. excellent point scores) with those of the worst

performing LAS laboratories (e.g. unacceptable point scores). However,

the reported metadata of both populations was indistinguishable; all

used a similar number of sample injections and ignores, and claimed

similar data corrections. As another example, we examined laboratories

(all technologies) that reported using VSMOW2 and SLAP2 and two‐

point normalization to see whether δ‐scaling issues were a factor,

hypothesizing that laboratories using primary reference materials

would give more accurate outcomes for all WICO samples. However,

laboratories that used VSMOW2 and SLAP2 as calibration standards

gave a proportionately similar number of excellent and unacceptable

outcomes to the overall population. Similarly, when we used the

reported results of WICO3 and WICO4 and rescaled the reported

results to obtain WICO2 and WICO1 results, a similar outcome was

found. Clearly, reference δ‐value scaling factors did not appear to be a

significant performance contributor that could easily be distinguished.

The fact we were unable to definitively identify any specific good or

bad performance indicators based on reported results and operational

metadata led us to conclude that poor laboratory performance probably

resulted from unquantifiable 'known unknowns'. These are mistakes

and human errors that are common but would be impossible to quantify

or identify without conducting individual onsite assessments,33 and

that may not be apparent to the laboratory. These include knowledge‐

based or skill‐based factors such as operator experience, basic data

processing mistakes, measurement protocol violations, compromised

or evaporated laboratory standards, or poorly functioning analytical

instruments and peripherals. Recent studies have suggested that

human errors are not insignificant in contributing to underperforming

geochemical analysis outcomes.33 The premise that human, technical,

and instrumental errors are the main drivers for poor water isotope

performance was supported by a post‐WICO2016 survey of the

participating laboratories (Table 2). Twomonths after reporting test results

to laboratories, we asked participants which factors they had identified

that resulted in either overall or specific sample underperformance. Of

the 98 responses received, most laboratories indicated that there were

multiple knowledge‐ or skill‐based factors that led to poor outcomes.

The top 3 factors identified (equally ~30 % each) were mistakes in data

normalization and processing, use of laboratory standards that did not

span the WICO sample δ‐range (although scaling issues did not seem

significant in our general reassessment), andpoor instrumental performance

problems. Other lesser factors identified included compromised or

evaporated laboratory standards (15 %). Surprisingly, 23 % of

underperforming laboratories were not able to identify clear reasons

for poor outcomes several months after participating in theWICO test.

4 | SUMMARY

The WICO2016 inter‐comparison test of 235 laboratories conducting

δ2H and δ18O measurements revealed that ~73 % of laboratories

reported acceptable core sample results, within 0.2 ‰ and 1.5 ‰


of reference δ18O and δ2H values, whereas ~27 % produced

unacceptable results. The top performance for δ18O values was

dominated by dual‐inlet IRMS laboratories; whereas the top δ2H

performance was dominated by LAS laboratories. CF‐IRMS

instruments yielded intermediate results to the LAS and DI‐IRMS

instrument laboratories. The methanol‐contaminated WICO sample

resulted in extremely biased δ‐values for LAS instruments, but VOC

contamination also affected HTC and Cr CF‐IRMS systems. DI‐IRMS

instrument results were unaffected by the methanol contamination.

Overall, from our analysis of submitted laboratory metadata and

test sample performance outcomes, it appeared that poor performance

in WICO mainly resulted from skill‐ and knowledge‐based errors;

these included calculation mistakes, inappropriate or compromised

laboratory calibration standards, use of poorly performing

instrumentation, and lack of vigilance to sample contamination and

unreasonable isotopic outcomes. To counteract these types of errors,

we recommend that stable isotope laboratories include one or two

control standards of known δ‐values (not used in data normalization)

in each autorun; LAS laboratories should screen all their autoruns for

potential spectral contamination; and all laboratories should evaluate

whether the derived d‐excess values (if measuring both isotope ratios)

are realistic. Combined, these simple strategies could help to more

quickly inform the analyst about mistakes, and alert the analyst to

compromised water samples.

ACKNOWLEDGEMENTS

WICO2016 would not have been possible without the enthusiastic

cooperation of scientists and staff of the international stable isotope

laboratories that participated. Thomas Blunier (Denmark) kindly

supplied a stock of Greenland firn meltwater for WICO6. Bernhard

Mayer (Canada) supplied Bow River water for WICO3. We thank

J.‐F. Hélie and H.A.J. Meijer whose facilities served as reference

laboratories for the WICO test samples. We thank M. Urresti for

assistance with WICO sample preparation and T. Chavez for

assistance with organizing international sample shipments. We thank

three anonymous reviewers for constructive comments. This work

was funded by the International Atomic Energy Agency. IAEA staff

(LIW, STW, CD, LAA, PKA) conceived and implemented WICO2016,

conducted sample preparation, analysis, surveys, sample shipping,

data collection and statistical analysis, processed laboratory reports,

and wrote this manuscript. Laboratory identities are kept confidential

(LIW). TBC provided USGS reference values, provided VSMOW

to two reference laboratories, and corroborated our use of methanol

for WICO5.

ORCID

L.I. Wassenaar http://orcid.org/0000-0001-5532-0771

REFERENCES

1. Mook WG, Gat JR, Meijer HAJ, Rozanski K, Froehlich K. EnvironmentalIsotopes in the Hydrological Cycle: Principles and Applications. IAEA/UNESCO; 2001.

2. Kendall C, McDonnell JJ. IsotopeTracers in Catchment Hydrology. Elsevier;2012.

3. Clark ID. Groundwater Geochemistry and Isotopes. Boca Raton: CRCPress; 2015.

4. Schauer AJ, Schoenemann SW, Steig EJ. Routine high‐precision analy-sis of triple water‐isotope ratios using cavity ring‐down spectroscopy.Rapid Commun Mass Spectrom. 2016;30(18):2059‐2069.

5. de Groot PA. Handbook of Stable Isotope Analytical Techniques. 1Elsevier; 2004.

6. de Groot PA. Handbook of Stable Isotope Analytical Techniques. 2Elsevier; 2009.

7. Kerstel ET, VanTrigt R, Reuss J, Meijer HAJ. Simultaneous determinationof the 2H/1H, 17O/16O, and 18O/16O isotope abundance ratios in waterby means of laser spectrometry. Anal Chem. 1999;71(23):5297‐5303.

8. Lis G, Wassenaar LI, Hendry MJ. High‐precision laser spectroscopyD/H and 18O/16O measurements of microliter natural water samples.Anal Chem. 2008;80(1):287‐293. https://doi.org/10.1021/ac701716q

9. Berman ES, Levin NE, Landais A, Li S, OwanoT. Measurement of δ18O,δ17O, and 17O‐excess in water by off‐axis integrated cavity outputspectroscopy and isotope ratio mass spectrometry. Anal Chem.2013;85(21):10392‐10398. https://doi.org/10.1021/ac402366t

10. Brand WA, Geilmann H, Crosson ER, Rella CW. Cavity ring‐down spec-troscopy versus high‐temperature conversion isotope ratio massspectrometry; a case study on δ2H and δ18O of pure water samplesand alcohol/water mixtures. Rapid Commun Mass Spectrom.2009;23(12):1879‐1884. https://doi.org/10.1002/rcm.4083

11. West AG, Goldsmith GR, Brooks PD, Dawson TE. Discrepanciesbetween isotope ratio infrared spectroscopy and isotope ratio massspectrometry for the stable isotope analysis of plant and soil waters.Rapid Commun Mass Spectrom. 2010;24(14):1948‐1954.

12. Hendry MJ, Richman B, Wassenaar LI. Correcting for methaneinterferences on d2H and d18O measurements in pore water usingH2Oliquid‐H2Ovapor equilibration laser spectroscopy. Anal Chem.2011;83(14):5789‐5796. https://doi.org/10.1021/ac201341p

13. Wassenaar LI, CoplenTB, Aggarwal PK. Approaches for achieving long‐term accuracy and precision of δ18O and δ2H for waters analyzed usinglaser absorption spectrometers. Environ Sci Technol. 2014;48(2):1123‐1131. https://doi.org/10.1021/es403354n

14. Wassenaar LI, Ahmad M, Aggarwal P, et al. Worldwide proficiencytest for routine analysis of δ2H and δ18O in water by isotope‐ratiomass spectrometry and laser absorption spectroscopy. Rapid CommunMass Spectrom. 2012;26(15):1641‐1648. https://doi.org/10.1002/rcm.6270

15. Available: http://www‐naweb.iaea.org/napc/ih/index.html.

16. www.redseafish.com.

17. ISO/IEC. Statistical Methods for Use in Proficiency Testing byInterlaboratory Comparison; 2015.

18. Skrzypek G. Normalization procedures and reference material selec-tion in stable HCNOS isotope analyses: an overview. Anal BioanalChem. 2013;405(9):2815‐2823. https://doi.org/10.1007/s00216‐012‐6517‐2

19. Lorenz JM, Qi H, Coplen TB. Antarctic Ice‐Core Water (USGS49) – Anew isotopic reference material for δ2H and δ18O measurements ofwater. Geostand Geoanal Res. 2017;41:63‐68. https://doi.org/10.1111/ggr.12135

20. Pomme S, Keightley J. Determination of a reference value and its uncer-tainty through a power‐moderated mean. Metrologia. 2015;52(3):S200‐S212. https://doi.org/10.1088/0026‐1394/52/3/S200

21. Dansgaard W, Johnsen SJ, Clausen HB, et al. Evidence for generalinstability of past climate from a 250‐Kyr ice‐core record. Nature.1993;364(6434):218‐220. https://doi.org/10.1038/364218a0

22. Gkinis V, Popp TJ, Johnsen SJ, Blunier TA. Continuous stream flashevaporator for the calibration of an IR cavity ring‐down spectrometerfor the isotopic analysis of water. Isotopes Environ Health Stud.2010;46(4):463‐475. https://doi.org/10.1080/10256016.2010.538052

23. Paul D, Skrzypek G, Forizs I. Normalization of measured stable isotopiccompositions to isotope reference scales – a review. Rapid Commun

http://orcid.org/0000-0001-5532-0771

https://doi.org/10.1021/ac701716q

https://doi.org/10.1021/ac402366t


https://doi.org/10.1021/ac201341p

https://doi.org/10.1021/es403354n



http://www-naweb.iaea.org/napc/ih/index.html

http://www.redseafish.com

https://doi.org/10.1007/s00216-012-6517-2

https://doi.org/10.1007/s00216-012-6517-2

https://doi.org/10.1111/ggr.12135

https://doi.org/10.1111/ggr.12135

https://doi.org/10.1088/0026-1394/52/3/S200

https://doi.org/10.1038/364218a0

https://doi.org/10.1080/10256016.2010.538052


Mass Spectrom. 2007;21(18):3006‐3014. https://doi.org/10.1002/rcm.3185

24. Shirono K, Iwase K, Okazaki H, et al. A study on the utilization of theYouden plot to evaluate proficiency test results. Accredit QualAssurance. 2013;18(3):161‐174. https://doi.org/10.1007/s00769‐013‐0978‐7

25. Youden WJ. Graphical diagnosis of interlaboratory test results. Ind QualControl. 1959;15:24‐28.

26. Sofer Z, Gat JR. Activities and concentrations of 18O in concentratedaqueous salt solutions – Analytical and geophysical implications. EarthPlanet Sci Lett. 1972;15(3):232‐238. https://doi.org/10.1016/0012‐821x(72)90168‐9

27. Benetti M, Sveinbjörnsdóttir AE, Ólafsdóttir R, et al. Inter‐comparisonof salt effect correction for δ18O and δ2H measurements in seawaterby CRDS and IRMS using the gas‐H2O equilibration method.2017;194(Supplement C):114‐Mar Chem, 123. https://doi.org/10.1016/j.marchem.2017.05.010

28. Skrzypek G, Ford D. Stable isotope analysis of saline water samples ona cavity ring‐down spectroscopy instrument. Environ Sci Technol.2014;48(5):2827‐2834. https://doi.org/10.1021/es4049412

29. Koehler G, Wassenaar LI, Hendry J. Measurement of stable isotopeactivities in saline aqueous solutions using optical spectroscopymethods. Isotopes Environ Health Stud. 2013;49(3):378‐386. https://doi.org/10.1080/10256016.2013.815183

30. Epstein S, Mayeda T. Variation of 18O content of waters from naturalsources. Geochim Cosmochim Acta. 1953;4:213‐224.

31. van Geldern R, Barth JAC. Optimization of instrument setup andpost‐run corrections for oxygen and hydrogen stable isotope

measurements of water by isotope ratio infrared spectroscopy (IRIS).Limnol Oceanogr: Methods. 2012;10:1024‐1036. https://doi.org/10.4319/lom.2012.10.1024

32. West A, Goldsmith G, Matimati I, DawsonT. Spectral analysis softwareimproves confidence in plant and soil water stable isotope analysesperformed by isotope ratio infrared spectroscopy (IRIS). Rapid CommunMass Spectrom. 2011;25(16):2268‐2274.

33. Kuselman I, Kardash E, Bashkansky E, et al. House‐of‐security approachto measurement in analytical chemistry: quantification of human errorusing expert judgments. Accredit Qual Assurance. 2013;18(6):459‐467.https://doi.org/10.1007/s00769‐013‐1020‐9

SUPPORTING INFORMATION

Additional Supporting Information may be found online in the

supporting information tab for this article.

How to cite this article: Wassenaar LI, Terzer‐Wassmuth S,

Douence C, Araguas‐Araguas L, Aggarwal PK, Coplen TB.

Seeking excellence: An evaluation of 235 international

laboratories conducting water isotope analyses by isotope‐

ratio and laser‐absorption spectrometry. Rapid Commun

Mass Spectrom. 2018;32:393‐406. https://doi.org/10.1002/

rcm.8052



https://doi.org/10.1007/s00769-013-0978-7

https://doi.org/10.1007/s00769-013-0978-7

https://doi.org/10.1016/0012-821x(72)90168-9

https://doi.org/10.1016/0012-821x(72)90168-9

https://doi.org/10.1016/j.marchem.2017.05.010

https://doi.org/10.1016/j.marchem.2017.05.010

https://doi.org/10.1021/es4049412

https://doi.org/10.1080/10256016.2013.815183

https://doi.org/10.1080/10256016.2013.815183

https://doi.org/10.4319/lom.2012.10.1024

https://doi.org/10.4319/lom.2012.10.1024

https://doi.org/10.1007/s00769-013-1020-9



Documents

Seeking excellence: An evaluation of 235 international .../menu/standar… · compositions of water has evolved over six decades, and is classified into three types: (i) ... entrained