Upload
steve-warner
View
216
Download
3
Embed Size (px)
Citation preview
ARTICLE IN PRESS
1352-2310/$ - se
doi:10.1016/j.at
�Correspond
E-mail addr
Atmospheric Environment 39 (2005) 4425–4437
www.elsevier.com/locate/atmosenv
Comparisons of transport and dispersion model predictions ofthe european tracer experiment: area- and population-based
user-oriented measures of effectiveness
Steve Warner�, Nathan Platt, James F. Heagy
Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, Virginia 22311-1882, USA
Received 22 November 2004; accepted 18 February 2005
Abstract
In October 1994, a tracer gas was released from a location in northwestern France and tracked at 168 sampling
locations in 17 countries across Europe. This release, known as the European Tracer Experiment (ETEX), resulted in
the collection of a wealth of data. This paper applies a previously described user-oriented measure of effectiveness
(MOE) methodology to evaluate the predictions of 46 models against the long-range ETEX observations. The paper
extends previous work by computing MOE values that are based on ‘‘true’’ areas (e.g., in square kilometers) and on
actual European population distributions. In this way, assessments of model predictions of ETEX are placed in a
possible operational context. The predictive performance of the models were assessed with nominal, area-, dosage-, and
population-based MOE values and ranked using a few notional user-based scoring functions. This study finds that the
rankings of some models—in particular, several of the model predictions that were ranked in the top 10—are relatively
insensitive to the particular MOE technique used. This robust behavior, with respect to analysis assumptions, is
regarded as an important feature of model performance.
r 2005 Elsevier Ltd. All rights reserved.
Keywords: Model intercomparison; Transport and dispersion; Measure of effectiveness; European tracer experiment (ETEX)
1. Introduction
In October 1994, the inert, environmentally safe tracer
gas perfluoro-methyl-cyclohexane (PMCH) was released
over a 12-h period from a location in northwestern
France and tracked at 168 sampling locations in 17
countries across Europe extending over a thousand
kilometers (Graziani et al., 1998). The authors have
obtained from the Joint Research Centre of the
European Commission (Ispra, Italy) 46 sets of transport
and dispersion predictions associated with models from
e front matter r 2005 Elsevier Ltd. All rights reserve
mosenv.2005.02.058
ing author. Tel.: 703 845 2096; fax: 703 845 6977.
ess: [email protected] (S. Warner).
17 countries as well as the observed PMCH sampling
data associated with the October 1994 European Tracer
Experiment (ETEX) release (Mosca et al., 1998a).
A previously developed user-oriented two-dimen-
sional measure of effectiveness (MOE) was recently
(Warner et al., 2004a) used to evaluate the predictions of
these 46 models against the long-range ETEX observa-
tions (Warner et al., 2004b). The two-dimensional MOE
allows for the evaluation of transport and dispersion
model predictions in terms of false negative (under-
prediction) and false positive (overprediction) regions.
We define the false negative region (AFN) where hazard
is observed but not predicted and the false positive
region (AFP) where hazard is predicted but not observed.
d.
ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374426
The x-axis of this two-dimensional metric corresponds
to the ratio of overlap region (AOV) to observed region
(AOB) and the y-axis corresponds to the ratio of overlap
region to predicted region (APR). These mathematical
definitions can be algebraically rearranged and we then
recognize that the x-axis corresponds to 1 minus the
false negative fraction and the y-axis corresponds to
1 minus the false positive fraction.
MOE ¼ x; yð Þ ¼AOV
AOB;AOV
APR
� �
¼AOV
AOV þ AFN;
AOV
AOV þ AFP
� �
¼ 1 �AFN
AOB; 1 �
AFP
APR
� �. ð1Þ
Importantly, this MOE considers the direction of the
plume or location of the hazard when evaluating a
model’s performance.
The computation of the MOE does not necessarily
require estimated areas and hence, area interpolation. In
a previous study (Warner et al., 2004b and Platt 2004),
no area interpolations were used to compute the MOE
values; rather, computations of the components of the
MOE were based solely on direct comparisons of
predictions and field trial observations paired in space
and time. We refer to MOE values computed in this way
as nominal. This previous effort assessed model perfor-
mance by ranking the computed MOE values for the 46
sets of model predictions by a few notional scoring
functions that were previously developed (Warner et al.,
2003). One question that arose during this previous
study was whether the rankings of the 46 models would
be particularly sensitive to using area-based MOE
values, computed after applying area interpolation,
as opposed to direct sampler observation–prediction
comparisons.
The research reported here extends the previous work
by computing MOE values that are based on ‘‘true’’
areas (e.g., in square kilometers), on estimated dosages,
and on actual European population distributions. First,
MOE values based on actual areas (e.g., square kilo-
meters) are created for both assessments of concentra-
tions and dosages. Then, actual European population
distributions are included to place the MOE in its
ultimate operational context—fraction of the population
falsely warned and fraction of the population inadver-
tently exposed. At this point, the 46 sets of model
predictions can be compared and assessed within this
more operational context.
Previously, area-based MOE values have been com-
puted in two cases. First, as part of an examination
of model predictive performance of the short range
(o 800 m) Prairie Grass field experiment, two interpola-
tion techniques were explored and discussed (Warner et
al., 2001a). In that case, a relatively densely sampled
space was available—545 samplers in a 1 km2 region. It
was found that for this relatively densely sampled
experiment, results based on the nominal and area-
based MOE were similar. Next, area-based MOE values
have been used to compare sets of model predictions of
simulated releases. In that case, area-based MOE values
were estimated by considering very dense grids of model
predictions (Warner et al., 2001b). ETEX did not have a
dense sampling network. However, our interests in
comparing and assessing model predictions of ETEX
are associated with rather large-scale, lower-resolution
features. That is, we focused on determining which
models best capture the general shape of the plume and
its time evolution as opposed to, for example, details of
the concentration time history in a specific European
valley or urban area—which would not be reasonably
possible, in general, with such relatively sparse sampling
(168 samplers across Europe).
2. Brief description of ETEX
The tracer gas PMCH was released over a 12-h period
on 24 October 1994 from a location 35 km west of
Rennes in Brittany, France. Samplers were located at
168 locations across 17 European countries and air
samples were collected every 3 h for a period of 90 h after
the initial release. Background measurements suggested
that a level of 0.01 ng m�3 should be used as the
minimum for all statistical comparisons (Graziani
et al., 1998). Details associated with the conduct of this
experiment, the sampling network, and the time evolu-
tion of the tracer cloud can be found in Girardi et al.
(1998) and van Dop and Nodop (1998). Two years after
the ETEX releases, a modeling exercise known as the
Atmospheric Transport Model Evaluation Study
(ATMES) II (Mosca et al., 1998a) was con-
ducted. ETEX-ATMES II predictions associated with
46 model configurations were provided to the authors by
the Joint Research Center (Ispra, Italy) of the European
Commission. Table 1, extracted from Mosca et al.
(1998a), provides some details associated with these
models.
3. Nominal MOE computations
The components of the MOE—AFN, AFP, and AOV—
were computed directly from the predictions and field
trial observations paired in space and time as in Warner
et al. (2004b) and Warner et al. (2003). Procedures
associated with the computation of nominal MOE
values and recent examples can be found in Warner
et al. (2004a, b, and c). For this analysis, observed and
predicted average concentrations were compared for the
30, 3-h time periods of PMCH monitoring during
ARTICLE IN PRESS
Table 1
ATMES II participants for which the authors obtained predictions
Model Acronym Participant Nationality
101 IMP Institute of Meteorology and Physics, University of Wien Austria
102 BMRC Bureau of Meteorology Research Centre Australia
103 NIMH-BG National Institute of Meteorology and Hydrology Bulgaria
104 NIMH-BG National Institute of Meteorology and Hydrology Bulgaria
105 CMC Canadian Meteorology Centre Canada
106 DWD German Weather Service Germany
107 DWD German Weather Service Germany
108 NERI Nat. Environment Research Inst./Risoe Nat. Lab./Univ. of Cologne Germany/Denmark
109 NERI Nat. Environment Research Inst./Risoe Nat. Lab./Univ. of Cologne Germany/Denmark
110 DMI Danish Meteorological Institute Denmark
111 IPSN French Institute for Nuclear Protection and Safety France
112 EDF French Electricity France
113 ANPA National Agency for Environment Italy
114 CNR National Research Council Italy
115 JAERI Japan Atomic Research Institute Japan
116 MRI Meteorological Research Institute Japan
117 NIMH-R National Institute of Meteorology and Hydrology Romania
118 FOA Defense Research Establishment Sweden
119 MetOff Meteorological Office United Kingdom
120 NOAA National Oceanic and Atmospheric Administration United States
121 SCIPUFF ARAP Group of Titan Research and Technology United States
122 KMI Royal Institute of Meteorology of Belgium Belgium
123 Meteo Meteo France France
127 LLNL Lawrence Livermore National Laboratories United States
128 SMHI Swedish Meteorological and Hydrological Institute Sweden
129 SAIC Science Applications International Corporation United States
130 IMS Swiss Meteorological Institute Switzerland
131 DNMI Norwegian Meteorological Institute Norway
132 SRS Westinghouse Savannah River Laboratory United States
133 JMA Japan Meteorological Agency Japan
134 JMA Japan Meteorological Agency Japan
135 MSC-E Meteorological Synthesizing Centre—East Russia
201 BMRC Bureau of Meteorology Research Centre Australia
202 CMC Canadian Meteorological Centre Canada
203 DWD German Weather Service Germany
204 NERI Nat. Environment Research Inst./Risoe Nat. Lab./Univ. of Cologne Germany/Denmark
205 DMI Danish Metrological Institute Denmark
206 Meteo Meteo France France
207 MRI Meteorological Research Institute Japan
208 SMHI Swedish Meteorological and Hydrological Institute Sweden
209 MetOff Meteorological Office United Kingdom
210 MetOff Meteorological Office United Kingdom
211 NOAA National Oceanic and Atmospheric Administration United States
212 NIMH-R National Institute of Meteorology and Hydrology Romania
213 DNMI Norwegian Meteorological Institute Norway
214 MSC-E Meteorological Synthesizing Centre—East Russia
The model predictions denoted with a number between 101 and 135, the ‘‘100 series,’’ used European Centre for Medium Range
Weather Forecasts (ECMWF) analyzed meteorological data as input. The ‘‘200 series’’ (201–214) used weather inputs produced by
independent numerical weather prediction models different from ECMWF’s.
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4427
ETEX. MOE values based on concentration thresholds
of 0.01, 0.10, and 0.50 ng m�3 were examined in this
study consistent with the previous studies of Mosca et al.
(1998b) and Boybeyi et al. (2001).
4. Area-based concentration MOE computations
Given values at a discrete (perhaps irregular) set
of samplers, the process of interpolation provides
ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374428
intermediate values on some regular grid of points. The
resulting regular grid of functional values could be used
to obtain contours of ‘‘hazard’’ areas (areas within a
critical threshold contour) or calculate MOE values
based on interpolated areas. For example, the MOE
components—AFN, AFP, and AOV—of an area-based
concentration MOE would have units of square kilo-
meters if assessed for a threshold value or concentra-
tion� area if assessed based on summed concentrations
(e.g., marginal differences for AFP and AFN).
The Delaunay triangulation procedure is useful for
the interpolation, analysis, and visual display of
irregularly, discretely gridded data (Guibas et al.,
1992). From a set of discrete points (sampler coordi-
nates), a planar triangulation is formed, satisfying the
property that the circumscribed circle of any triangle in
the triangulation contains no other vertices in its
interior. For any point that is within some triangle
(formed via Delaunay triangulation), a linear interpola-
tion routine using values at the vertices of the triangle is
used to calculate the value at that point. For the
interpolations reported in this paper, the data (observa-
tions and predictions) were first transformed logarith-
mically because actual plume concentrations or dosages
varied over orders of magnitude. The above routine was
applied with a resolution of 2 km� 2 km corresponding
to 1001� 1001 grid points. The example displays
reported in Fig. 1 are based on the logarithmic
transformation of the observed data followed by
Delaunay triangulation and linear interpolation as
described above.
The adopted procedure, while simple and yielding
some perhaps less visually pleasing sharp edges,
appeared to be robust and necessarily maintains the
actual observed values at the sampler locations—this
would not be true for many fitting procedures. A few
other interpolation schemes (for example, skipping the
logarithmic transformation of the data step from the
above procedure) and area-based weighting schemes
were also examined as a part of this study. Some of the
examined techniques correspond to interpolated plumes
that are likely conservative by construction, i.e., over-
estimating areas and people who would be encountered
by either the observed or predicted plume. Relative
model performance rankings based on scoring functions
applied to area-based MOE values were compared for
these different techniques. It was concluded that model
performance rankings were relatively robust for several
different techniques. Therefore, in this paper we focus
on only one—arguably the simplest—interpolation
procedure. Additional details associated with other
interpolation and weighting schemes that were examined
are described in Warner et al. (2003) and Warner et al.
(2004d).
At this point then, MOE values can be computed in a
manner analogous to the procedure for the nominal
MOE. However, in this case, the MOE components—
AOV, AFN, and AFP—are computed directly from the
consideration of the 1001� 1001 discrete grid points
instead of the 168 sampling locations.
It must be noted that interpolating between sampler
locations cannot capture ‘‘peaks’’ or ‘‘holes’’ in the
concentration distribution that may lie between sam-
plers. For densely sampled regions this would not be a
problem. However, for situations where, for example,
complex terrain or a highly urbanized environment lies
between perhaps sparse sampler locations, one might
expect considerable variations in the concentrations as a
function of time and location. Over the long distances
associated with ETEX, it is reasonable to expect that the
locations of any holes or peaks may shift in time and
ultimately be mitigated by dispersive effects. Further-
more, in the next section, the computation of dosage-
based MOE values is described. These dosage-based
values consider the summation of thirty 3-h time
periods. The summation process should reduce the
likelihood of large unexpected variations (peaks or
holes) between sampler locations by smoothing out
temporal differences that may be evident at the 3-h time
resolution.
5. Area-based dosage MOE computations
To create ‘‘observed’’ dosages at given locations, one
sums the concentrations at each sampler location. For
example, if a 3-h average concentration of 0.01 ng m�3
were observed for 12 h (720 min) at a given location, a
dosage of 7.2 ng min m�3 would be computed for that
site. However, periods of time in which sampler data
could not be (or were not) collected exist for many of the
sampler locations. If there were only a few of these
missing points, one could simply remove them (along
with the corresponding prediction) from the analysis
and compute dosages for locations that had continuous
coverage. For the ETEX release, however, there are
many locations that have at least some missing time
periods. Therefore, one must fill in these values in some
manner in order to create a dosage. The spatial
interpolation described previously for the area-based
MOE values provides a natural way to fill in the
temporal holes in the observed concentration data. Since
predictions exist (in general) at all time periods,
predicted dosages can be created by direct summation
of the predicted concentrations. For the few model
cases, where predictions were missing for some samplers
and at some time periods, the corresponding observation
was removed from the calculation. This procedure leads
to area-based, dosage MOE values.
For this analysis, three threshold dosages were
examined: 7.2, 72, and 360 ng min m�3. These three
values can be related to the 3-h average concentration
ARTICLE IN PRESS
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
44
48
52
56
60
Degrees Longitude
Deg
rees
Lat
itud
e
6
24 30
42
60
78 84 90
7266
48 54
36
12 18
Fig. 1. Observed PMCH concentrations across Europe. Plots display contours from 6 h after the release for the upper left plot to 90 h
after the release for the lower right plot in increments of 6 h. Contours are 0.01, 0.1, and 0.5 ng m�3. Bold numbers on individual plots
correspond to the last hour of the given 6-h period. Interpolation accomplished with a resolution of 1001� 1001 grid points (Warner et
al., 2003).
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4429
ARTICLE IN PRESS
1500
1000
500
0
-500
km
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374430
thresholds of 0.01, 0.1, and 0.5 ng m�3 (previously
discussed), respectively, by considering a 12-h
(720 min) period in which the cloud might pass over
any individual sampler location, i.e, 720 min-
0.01 ng m�3¼ 7.2 ng min m�3; 720� 0.1 ¼ 72 ng min
m�3; and 720� 0.5 ¼ 360 ng min m�3. The MOE com-
ponents—AFN, AFP, and AOV—of an area-based dosage
MOE would have units of square kilometers if assessed
for a threshold value or dosage� area if assessed based
on summed dosages.
-1000
-1500-2000 -1000 0 1000 2000
km
Fig. 2. Illustration of European population distribution:
numbers in legend correspond to population within
2 km� 2 km cell.
6. Population-based MOE computations
Dosage-based MOE values can be converted into
population-based values by including the underlying
non-uniform European population distribution. Fig. 2
illustrates the population distribution that was used
(LANDSCAN 2000). The population distribution
shown below is represented by population values at
about 2.1 million grid cells; 1501 in the x direction
(‘‘east–west’’) and 1401 in the y direction (‘‘north–
south’’). This results in a grid cell size of 2 km� 2 km.
The overall European population represented here is
about 500 million.
We define D(i,j) as the dosage associated with the
2 km� 2 km grid cell (i,j), pi,j as the associated popula-
tion, and TD as the dosage threshold of interest. Then,
we identify OVD (i), FND (i), and FPD (i) as follows
OVD ði; jÞ ¼1 if observed Dði; jÞ � TD and predicted Dði; jÞ � TD;
0 otherwise;
(
FND ði; jÞ ¼1 if observed Dði; jÞ � TD and predicted Dði; jÞoTD;
0 otherwise;
(
FPD ði; jÞ ¼1 if observed Dði; jÞoTD and predicted Dði; jÞ � TD;
0 otherwise:
�
(2)
Summing for all grid cells and including the population
weights (pi,j) leads to values for AOV, AFN, AFP that are
based on the European population:
AOV ¼PNi
i¼1
PNj
j¼i
pi;j � OVDði; jÞ� �
AFN ¼PNi
i¼1
PNj
j¼i
pi;j � FNDði; jÞ� �
;
AFP ¼PNi
i¼1
PNj
j¼i
pi;j � FPDði; jÞ� �
:
(3)
The MOE components computed in this way have units
of people when assessed for a threshold value. At this
point then, for a given threshold, the MOE values can be
expressed, with the x-axis labeled ‘‘one minus the
fraction of the population inadvertently exposed’’ and
the y-axis labeled ‘‘one minus the fraction of the
population unnecessarily warned’’ – i.e., population-
based MOE values.
7. Scoring functions
Several notional scoring functions for the MOE space
have been previously described including their algebraic
derivations (Warner et al., 2003, 2004a, b). These
scoring functions can be thought of as corresponding
to the requirements of different possible model users.
Such scoring functions can thus aid in assessing if a
model’s MOE value, for a given set of field observations,
is ‘‘good enough.’’ For a given application and user risk
tolerance, certain regions of the two-dimensional MOE
space may be considered acceptable. For example, some
users may tolerate a certain false positive fraction
(ultimately, unnecessarily warned individuals) but re-
quire a very low false negative fraction (inadvertently
exposed individuals). Such a risk tolerance profile
implies a certain region in the two-dimensional MOE
space, which can be turned into a mathematical function
for ‘‘scoring’’ the MOE predictions.
Table 2 summarizes some useful scoring functions
associated with assessments of MOE values. The
objective scoring function (OSF) allows for the assess-
ment of which model prediction achieves an MOE value
‘‘closest’’ to the perfect value of (1,1)—complete overlap
occurs at OSF ¼ 0. The absolute fractional bias—
ABS(FB)—scoring function provides for an evaluation
of which model best predicted the overall amount of
material, or in the case of a threshold based MOE,
which model best predicted the overall size of the area
that was affected by material above a defined threshold.
ABS(FB) ¼ 0 implies no bias (e.g., neither an over- nor
underprediction). The normalized absolute difference
ARTICLE IN PRESS
Table 2
Algebraic relationships between scoring functions and the two-dimensional MOE
Scoring function Relationship to MOE (x,y)
Objective Scoring Function—OSF OSF ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix2 þ y2
pAbsolute Fractional Bias—ABS(FB) ABSðFBÞ ¼ 2 x�yð Þ
xþy
Normalized Absolute Difference—NAD NAD ¼
xþy�2xyxþy
Figure of Merit in Space—FMS FMS ¼xy
xþy�xy
Risk-Weighted FMS—RWFMS(5, 0.5) RWFMSð5; 0:5Þ ¼ xyxyþ5y 1�xð Þþ0:5x 1�yð Þ
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4431
(NAD) scoring function, which measures scatter, leads
to assessments of which model predictions demonstrated
the smallest differences between observations and
predictions. NAD ¼ 0 implies no differences—a perfect
prediction.
Recent studies of model predictions of ETEX have
included the use of a figure of merit in space (FMS),
defined as the overlap area between the prediction and
observation divided by the total predicted and observed
areas, all above some threshold concentration (Klug
et al., 1992; Mosca et al., 1998b; Boybeyi et al., 2001).
Table 2 describes the algebraic relationship between
FMS and MOE. There is a strictly monotonic relation-
ship between NAD and FMS (Warner et al., 2003)
implying that scoring model performance based
on NAD or FMS will necessarily lead to identical
rank orderings. A value of FMS ¼ 1 implies complete
overlap.
Some users of hazardous material transport and
dispersion models might consider false positives and
false negatives quite differently. For many applications,
false positives would be much more acceptable to the
user than false negatives (which could result in decisions
that directly lead to death or injury). Eq. (4) is an
example of a user scoring function that takes the above
risk tolerance into consideration. We refer to this
notional user scoring function as the Risk-Weighted
FMS (RWFMS):
RWFMS ¼xy
xy þ CFNy 1 � xð Þ þ CFPx 1 � yð Þ(4)
where CFN, CFP40. This equation describes a modified
FMS that includes coefficients, CFN and CFP, to weight
the false negative and false positive regions, respec-
tively—denoted RWFMS(CFN, CFP). Therefore,
RWFMS(1,1) ¼ FMS. For this study, RWFMS(5, 0.5)
was used as a scoring function, implying that false
negative contributions to the MOE are considered 5
times more important than overlap and 10 times as
important as false positive contributions—suggesting a
conservative user and application.
8. Results of MOE comparisons
Table 3 compares nominal (NOM), area-based con-
centration (ABConc), area-based dosage (ABDos), and
population-based (POP) MOE rankings for the FMS
and for a threshold of 0.1 ng m�3. The model numbers
shown in Table 3 correspond to the ATMES II
predictions defined in Table 1. Warner et al. (2004d)
provides tables for each of the three scoring functions,
all of the examined thresholds, and for summed
concentration and dosage-based MOE values. The
FMS-based rankings, as shown in Table 3, appear
relatively robust to the different MOE techniques used
to compute the MOE values. Similarly, of the top 5
OSF-ranked models based on the nominal MOE, 4, 4,
and 4 appear in the top 5 based on the ABConc, ABDos,
and POP MOE values, respectively. Similarly, of the
bottom 5 OSF-ranked models based on the nominal
MOE, 4, 5, and 5 appear in the bottom 5 based on the
ABConc, ABDos, and POP MOE values, respectively.
Fig. 3 presents histograms of the changes in OSF,
NAD, and RWFMS(5, 0.5) rankings that result from
subtracting ABConc summed concentration MOE
rankings from the nominal summed concentration
MOE rankings. A positive difference of 5, for example,
implies that the associated model improved 5 rankings
when the ABConc MOE technique was applied in place
of the nominal procedure. The biggest changes are
associated with very large improved rankings for models
118, 121, and 127 when the ABConc MOE procedure
was followed. These three models were ranked as 20, 41,
and 33, respectively, based on the nominal MOE and
OSF. For the ABConc MOE, models 118, 121, and 127
are ranked as 6, 28, and 1, respectively, improving by 14,
13, and 32 rankings. Similar improvements in rankings
are seen for models 121 and 127 for the NAD and
RWFMS(5, 0.5) scoring functions. These improved
rankings mirror changes seen for these same models in
a previous study (Warner et al., 2004b) when the single
‘‘near-release’’ sampler at Rennes was removed. As
reported in Warner et al. (2003), the OSF-based
ARTICLE IN PRESS
Table 3
Comparisons of model rankings based on the FMS for NOM, ABConc, ABDos, and POP based MOE values
Rank Model NOM Model ABConc Model ABDos Model POP
1 208 0.577 208 0.524 105 0.729 105 0.823
2 128 0.551 105 0.503 208 0.721 127 0.806
3 202 0.545 202 0.498 127 0.713 208 0.786
4 101 0.544 128 0.483 202 0.685 202 0.777
5 127 0.526 127 0.476 134 0.664 106 0.761
6 107 0.521 114 0.444 128 0.658 128 0.756
7 105 0.517 101 0.441 119 0.656 101 0.748
8 131 0.508 210 0.425 113 0.648 110 0.747
9 118 0.497 209 0.421 210 0.640 104 0.746
10 115 0.493 131 0.413 111 0.623 119 0.741
11 205 0.487 106 0.411 121 0.623 205 0.736
12 134 0.484 113 0.410 106 0.623 121 0.735
13 106 0.482 115 0.399 209 0.621 134 0.734
14 210 0.481 107 0.394 131 0.608 111 0.726
15 114 0.478 213 0.392 205 0.608 131 0.722
16 111 0.473 111 0.389 101 0.605 113 0.717
17 209 0.470 118 0.384 114 0.603 112 0.714
18 204 0.467 205 0.374 110 0.587 204 0.713
19 213 0.460 119 0.374 213 0.581 107 0.706
20 110 0.456 204 0.355 115 0.578 132 0.704
21 133 0.455 207 0.354 104 0.578 115 0.703
22 119 0.439 110 0.350 107 0.565 210 0.699
23 113 0.428 133 0.348 204 0.551 123 0.686
24 207 0.426 134 0.345 123 0.547 103 0.682
25 102 0.423 102 0.331 118 0.545 209 0.679
26 123 0.421 123 0.330 207 0.536 206 0.669
27 201 0.403 203 0.313 102 0.529 213 0.668
28 203 0.399 201 0.310 112 0.521 114 0.664
29 211 0.396 108 0.310 108 0.519 207 0.661
30 121 0.379 121 0.303 135 0.513 203 0.637
31 108 0.378 122 0.292 132 0.510 109 0.627
32 122 0.368 103 0.290 103 0.506 118 0.620
33 104 0.365 135 0.285 206 0.505 211 0.617
34 120 0.354 116 0.285 133 0.504 102 0.616
35 103 0.351 109 0.280 116 0.498 116 0.605
36 135 0.345 211 0.277 203 0.493 108 0.602
37 116 0.341 104 0.265 109 0.478 201 0.601
38 109 0.336 120 0.236 211 0.476 133 0.592
39 112 0.306 112 0.231 201 0.447 135 0.592
40 132 0.270 206 0.216 122 0.427 122 0.558
41 206 0.269 214 0.186 120 0.417 120 0.543
42 214 0.263 132 0.178 214 0.406 214 0.513
43 129 0.190 129 0.093 130 0.345 130 0.381
44 130 0.129 130 0.073 129 0.251 129 0.267
45 117 0.125 117 0.030 117 0.100 117 0.170
46 212 0.060 212 0.014 212 0.037 212 0.070
These comparisons were based on considering a concentration threshold of 0.1 ng m�3 for the NOM and ABConc MOE values and a
comparable dosage threshold of 72 ng min m�3 for the ABDos and POP MOE values.
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374432
rankings for summed concentration MOE values of
models 118 (FOA), 121 (SCIPUFF), and 127 (ARAC)
improved 16, 25, and 7, respectively, after the removal of
the single Rennes sampler location. Therefore, the
previously described sensitivity of model rankings to a
single sampler location appears to be mitigated by the
ARTICLE IN PRESS
0
2
4
6
8
10
12
-20 -8 -4 0 4 8 12 16 20Binned Ranking Differences
Difference = 4
127121
Based on
Rankings
0
2
4
6
8
10
12
16 -12 -8 -4 0 4 8 12 16 20Binned Ranking Differences
Fre
quen
cy
012345678
-20 -16 -12 -8 -4 0 4 8 12 16 20Binned Ranking Differences
118
Based on NADRankings
-16 -12
Fre
quen
cy
Median Absolute Difference = 4
127121
Based on RWFMS(5,0.5)
Rankings
Median Absolute Difference = 3
127118 , 121
Based on OSF Rankings
Difference = 4.5
118
Based on NADRankings
Fre
quen
cy
Median Absolute
121, 127
118
Based on NADRankings
-20(a)
(b)
(c)
Fig. 3. Histograms of differences in rankings between NOM
and ABConc procedures: (a) OSF rankings, (b) NAD rankings,
and (c) RWFMS(5,0.5) rankings.
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4433
ABConc MOE procedure (which begins with a loga-
rithmic transformation followed by linear interpola-
tion).
Fig. 3 also reports the median absolute differences—
between NOM and ABConc—in rankings for the 46 sets
of predictions of the summed concentration. Median
values of 3, 4.5, and 4 are reported for OSF, NAD, and
RWFMS(5, 0.5) rankings, respectively. For perspective,
the random ordering of 46 entities was simulated and we
found the median absolute ranking difference was 13.5,
suggesting relatively robust behavior of the model
rankings shown in Table 3. Table 4 presents median
absolute differences between different MOE proce-
dures—NOM, ABConc, ABDos, and POP—and for
three different threshold levels.
Several results can be obtained from Table 4. First,
differences in model rankings are greatest when compar-
ing concentration-based and dosage-based MOE values.
The middle four columns (3 though 6) compare ranking
differences for concentration-based and dosage-based
MOE values and result in median ranking differences
between 3 and 7 with a median of the medians of 4. For
comparison, median ranking differences for NOM
versus ABConc (column 2), which examine differences
due to basing the concentration MOE on areas, were
between 1.5 and 2. Similarly, median ranking differences
for ABDos versus POP (column 7), which examine
differences due to basing the dosage MOE on actual
European population distributions, were between 1 and
4. No strong trends in the magnitude of the absolute
ranking differences between different MOE computa-
tional techniques could be assigned to increases in the
threshold.
A few models improve their relative rankings greatly
when assessed based on dosage MOE values (ABDos)
instead of concentration-based values (ABConc). For
example, for OSF rankings and the three comparative
concentration/dosage thresholds (Low, Medium, and
High), model 121 (SCIPUFF), moves up 26 (from 33 to
7), 20 (from 31 to 11), and 10 (from 25 to 15) positions,
respectively. Examination of 3-h concentration and
cumulative dosage plots for the observations and
predictions suggest that some models do not match the
3-hour timing (e.g., time of arrival and dwell) as well as
others. Therefore, while total dosages may be well
predicted, 3-h average concentrations that require both
the location and time to be matched may be predicted
worse (relative to the other models). This certainly
appears to be the case for the SCIPUFF predictions.
Fig. 4 shows contours associated with the SCIPUFF
3-h average concentration predictions and the corre-
sponding observations for the period of time starting
one day after the release. The figure indicates that the
SCIPUFF predictions seemed to ‘‘run ahead’’ of the
observations for the period of time after about 42 h.
Such a mismatch in timing would be expected to greatly
degrade 3-h average concentration MOE values (and
hence rankings) for SCIPUFF. However, summing these
concentrations over all time periods to create dosage
MOE values results in improved relative performance.
Table 5 lists the models that were in the top 5 or top
10 of 46 under more than one computational procedure.
This table describes results for three scoring functions
and three threshold levels—9 rows. The second column
identifies those model predictions that were always in the
top 5/top 10 for all techniques—NOM, ABConc,
ABDos, and POP. The third column identifies those
model predictions that were always in the top 5/top 10
for the concentration-based procedures. The final
column identifies those model predictions that were
always in the top 5/top 10 for all techniques that were
ARTICLE IN PRESS
Table 4
Median absolute ranking difference between various MOE value (NOM, ABConc, ABDos, and POP) rankings for 3 thresholds and
for 3 scoring functions
Threshold, scoring function NOM–ABConc NOM–ABDos NOM–POP ABConc–ABDos ABConc–POP ABDos–POP
Low, OSF 2 4 5 4 5 3
Low, FMS or NAD 1.5 3 4 5 4 4
Low, RWFMS(5,0.5) 2 5.5 4 4 4 1
Medium, OSF 2 5 6 3 4 4
Medium, FMS or NAD 2 5 7 3.5 6 4
Medium, RWFMS(5,0.5) 2 3.5 4 3 3 1
High, OSF 2 5 5 4 5.5 3
High, FMS or NAD 2 4 5 3 6 3.5
High, RWFMS(5,0.5) 2 5 5 4 6 2.5
For thresholds, ‘‘Low’’ implies 0.01 ng m�3 and 7.2 ng min m�3 for concentration and dosage measures, respectively; ‘‘Medium’’
implies 0.1 ng m�3 and 72 ng min m�3 for concentration and dosage measures, respectively; and ‘‘High’’ implies 0.5 ng m�3 and
360 ng min m�3 for concentration and dosage measures, respectively.
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374434
dosage-based. Therefore, Table 5 illustrates the models
that had robust performance (always top 5 or top 10)
with respect to the MOE computational technique. For
example, for the lowest thresholds considered, models
105 (CMC), 127 (LLNL), and 208 (SMHI) have the
most robust ‘‘top 5/top 10’’ performance for the OSF
and FMS (or NAD) scoring functions. Considering all
three thresholds, and the OSF and FMS/NAD scoring
functions, only one prediction, model 127, is always in
the top 10. For the more conservative RWFMS(5, 0.5)
scoring function, model 110 (DMI) resulted in the most
robust performance appearing in the top 10 at all three
threshold levels. For the dosage-based rankings (ABD
and POP), only one model, 127, appears in the top 10 in
all cases. Top 10 performance that spans both nominal
and conservative scoring functions, at any threshold,
appears rare, with models 105 (CMC with ECMWF),
127, and 202 (CMC) being the only ones to achieve such
a result.
Finally, model rankings obtained in this work were
briefly compared with the rankings described in Mosca
et al. (1998a, Table 168). The rankings discussed in this
previous study include the Total Rank obtained by
summing (with equal weights) individual rankings from
a variety of individual rankings based on statistical
measures applied to observations and predictions paired
in space and time. Several caveats apply to the
discussions that follow. First, there are likely differences
with the protocols that were used to process ATMES II
data (both predictions and observations) between
Fig. 4. Contours (based on Delaunay triangulation technique descr
SCIPUFF (model 121) predictions for the time periods between 24 and
the SCIPUFF predictions (black ¼ 0.5 ng m�3, dark blue ¼ 0.1 ng m
correspond to ‘‘observed’’ areas above the 3-h concentration thr
0.01 ng m�3). Numbers on individual plots correspond to the last hou
groups. Next, statistical quantities that underscore
rankings based in this work are not identical to the
statistical measures described in Mosca et al. (1998a).
Table 6 lists those models that were ranked within the
top 10 in the previous study as well as in this study by
each of the four MOE-based techniques—NOM,
ABConc, ABDos, and POP. In each case, six models
appear in the top 10 in both the previous study and in
the current study. In addition, four models—101 (IMP),
107 (DWD), 111 (IPSN), and 209 (MetOff)—appear in
the top 10 in all cases reported here as well as in the
previous study’s Total Rank. Interestingly, some of the
most robust predictions described earlier, for example,
model 127, do not appear in Table 6. Undoubtedly, one
cause of this was the poor performance of some models,
including model 127, associated with the closest sampler
location at Rennes, France. This close-in overprediction
has been previously discussed (Warner et al., 2004b) and
its impact on overall MOE assessments is somewhat
(and perhaps appropriately) mitigated by employing
interpolation procedures, i.e., ABConc, ABDos, and
POP.
9. Conclusions and discussion
The techniques and procedures described in this paper
provide a mechanism to assess model predictive
performance in a way that allows relative over- and
underpredictions to be evaluated simultaneously. After
ibed in text) for 3-h average concentration observations and
75 h after the release. The solid lines correspond to contours for�3, and lighter blue ¼ 0.01 ng m�3) and the shaded regions
esholds (red ¼ 0.5 ng m�3, orange ¼ 0.1 ng m�3, and yellow-
r of the given 3-h time period.
ARTICLE IN PRESS
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
East-West
No
rth
-So
uth
1000
500
0
-500
-10001000-500 0 500-1000
km
km
1000
500
0
-500
-10001000-500 0 500-1000
km
km
Obs VS Pred, Time = 24 hr Obs VS Pred, Time = 27 hr Obs VS Pred, Time = 30 hr
Obs VS Pred, Time = 33 hr Obs VS Pred, Time = 36 hr Obs VS Pred, Time = 39 hr
Obs VS Pred, Time = 42 hr Obs VS Pred, Time = 45 hr Obs VS Pred, Time = 48 hr
Obs VS Pred, Time = 51 hr Obs VS Pred, Time = 54 hr Obs VS Pred, Time = 57 hr
Obs VS Pred, Time = 60 hr Obs VS Pred, Time = 63 hr Obs VS Pred, Time = 66 hr
Obs VS Pred, Time = 69 hr Obs VS Pred, Time = 72 hr Obs VS Pred, Time = 75 hr
24 27 30
393633
42 45 48
575451
60 63 66
757269
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4435
ARTICLE IN PRESS
Table 5
Robust top 5/top 10 ranked models for 3 thresholds and for 3 scoring functions
Threshold, scoring
function
All NOM and ABConc
(concentration-based)
ABDos and POP (dosage-based)
Low, OSF 105, 208/127 105, 127, 202, 208/101, 114, 128,
210
105, 208/111, 121, 127, 131, 134
Low, FMS or NAD 105, 208/127 105, 127, 202, 208/101, 114, 128,
210
105, 208/111, 121, 127, 131, 134
Low, RWFMS(5,0.5) 113/101, 110, 123, 205 101, 113, 114/110, 115, 123, 205 104, 110, 113, 205/101, 112, 123,
127, 208
Medium, OSF 127, 202, 208/105, 128 127, 128, 202, 208/101, 105 105, 127, 202, 208/111, 119, 128,
134
Medium, FMS or NAD 127, 202, 208/105, 128 127, 128, 202, 208/101, 105 105, 127, 202, 208/119, 128
Medium, RWFMS(5,0.5) 101, 110, 205/105, 123, 202 101, 110, 123, 205/105, 106, 202 101, 104, 110, 127, 205/105, 123,
202
High, OSF 127/107, 128 107, 127, 128, 134/111, 205, 208 105, 127/107, 113, 128, 202
High, FMS or NAD 127/107, 128 107, 111, 127, 134/118, 128, 133,
208, 209
105, 127/107, 128, 131, 202
High, RWFMS(5,0.5) 105, 110/106, 127, 202, 205 105, 110, 128, 208/106, 107, 127,
134, 202, 205
105, 106, 110, 202/113, 127, 205
Table 6
Models ranked in the top 10 for Mosca et al (1998a) ‘‘Total
Rank’’ and summed concentration MOE OSF-rankings for four
techniques: NOM, ABConc, ABDos, and POP
NOM ABConc ABDos POP
101 (IMP) 101 (IMP) 101 (IMP) 101 (IMP)
107 (DWD) 107 (DWD) 107 (DWD) 107 (DWD)
111 (IPSN) 111 (IPSN) 111 (IPSN) 111 (IPSN)
114 (CNR) 208 (SMHI) 131 (DNMI) 115 (JAERI)
115 (JAERI) 209 (MetOff) 209 (MetOff) 131 (DNMI)
209 (MetOff) 210 (MetOff) 210 (MetOff) 209 (MetOff)
Models are ordered by numerical designator, not specified
ranking.
S. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374436
applying a straightforward interpolation method and
considering the underlying European population dis-
tribution, assessments of model performance in terms of
the fraction of the population falsely warned and
fraction of the population inadvertently exposed were
possible. For several sets of model predictions, perfor-
mance was found to be robust to the MOE computa-
tional technique used—for example, some of the same
models appeared in the top 10 over and over again.
Finally, two important caveats must be noted. First,
the rankings described in this paper result from
consideration of a single release and general inference
about which model is ‘‘best’’ or ranked highest is not
appropriate. Rather, these rankings describe perfor-
mance in terms of this specific release only. In addition,
for this single release field experiment, no direct
measures of uncertainty associated with the computed
MOE values or model rankings were constructed.
Previous studies that have examined multiple releases
have described techniques for assessing uncertainties
and comparing metrics to identify statistically significant
differences (Warner et al., 2004c).
Up to this point, the use of area-based and popula-
tion-based MOE values to compare sets of model
predictions, rank the models, and provide insight into
relative model performance has been emphasized. With
respect to the population-based two-dimensional (i.e., x
and y axes) MOE values, the x-axis corresponds to one
minus the fraction of the (exposed) population that is
inadvertently exposed (i.e., ‘‘not warned’’) to a threshold
level of interest and the y-axis corresponds to one minus
the fraction of the (warned) population that is un-
necessarily warned (at a threshold level of interest). One
might imagine using an effects (or lethality) model to
compute, via minimal extension of the MOE, the actual
number of people ‘‘falsely warned’’ or ‘‘inadvertently
exposed’’ as in Warner et al. (2004a). However, one
must be careful because of the relatively small number of
samplers associated with the observed ETEX data. In
attempting to describe the actual number of affected
people, one would need to rely on the absolute (actual)
areas computed, not simply the fraction of areas. In such
a case, the estimated area sizes are sensitive to the details
associated with the specific area-based technique used to
interpolate between sparse observations (Warner et al.,
2004d, p3–29). This sensitivity is solely based on the
limited availability of experimental observations. For
example, during ETEX, data was collected at sparsely
distributed sampler locations. However, in general,
predictions produce plumes on regular finely spaced
ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4437
grids and do not necessarily require any interpolation.
Thus, one might envision the following operational
procedure to assess actual ‘‘areas’’ or ‘‘numbers of
people affected.’’ First, models are compared to
observations using data at the samplers, both observed
and predicted, as was done in this study. Next, one
chooses a model prediction that demonstrates robust
and acceptable performance—that is, ‘‘top ten’’ perfor-
mance—using the two-dimensional MOE to assess
relative false positive and false negative regions. Then,
the corresponding ‘‘robust model’’ predicted plume
could be used to calculate actual areas and/or numbers
of people affected. We did not have access to the
predicted plumes for this study, only the predicted
concentrations at the sampler locations and thus could
not further explore this procedure.
Acknowledgements
The authors thank Stefano Galmarini (Joint Research
Centre—Environment Institute, Environment Monitor-
ing Unit, Ispra, Italy) for providing access to the
ATMES II model predictions and for useful discussions.
This effort was supported by the Defense Threat
Reduction Agency, with Mr. Richard Fry as project
monitor, and the Central Research Program of the
Institute for Defense Analyses. The views expressed in
this paper are solely those of the authors.
References
Boybeyi, Z., Ahmad, N., Bacon, D.P., Dunn, T.J., Hall, M.S.,
Lee, P.C.S., Sarma, R.A., Wait, T.R., 2001. Evaluation of
the operational multiscale environment model with grid
adaptivity against the European tracer experiment. Journal
Applied Meteorology 40, 1541–1558.
Girardi, F., Graziani, G., van Veltzen, D., Galmarini, S.,
Mosca, S., Bianconi, R., Bellasio, R., Klug, W. (Eds.), 1998.
The ETEX project. EUR Report 181-43 EN. Office of
official publications of the European Communities, Lux-
embourg, p. 108pp.
Graziani, G., Klug, W., Mosca, S., 1998. Real-time long-range
dispersion model evaluation of the ETEX first release. Joint
Research Center of the European Commission, Office of
official publications of the European Communities, L-2985
(CL-NA-17754-EN-C), Luxembourg 216pp.
Guibas, L.J., Knuth, D.E., Sharir, M., 1992. Randomized
incremental construction if Delaunay and Voronoi dia-
grams. Algorithmica 7, 381–413.
Klug, W., Graziani, G., Grippa, G., Pierce, D., Tassone, C.
(Eds.), 1992. Evaluation of long-range atmospheric trans-
port models using environmental radioactivity data from
the Chernobyl accident EUR Report 14147 EN. Office of
official publications of the European Communities, Lux-
embourg 366pp.
Mosca, S., Bianconi, R., Bellasio, R., Graziani, G., Klug, W.,
1998a. ATMES II—Evaluation of long-range dispersion
models using data of the 1st ETEX release. Joint Research
Center of the European Commission, Office of official
publications of the European Communities, L-2985 (CL-
NA-17756-EN-C), Luxembourg 608pp.
Mosca, S., Graziani, G., Klug, W., Bellasio, R., Bianconi, R.,
1998b. A statistical methodology for the evaluation of long-
range dispersion models: an application to the ETEX
exercise. Atmospheric Environment 32 (24), 4307–4324.
Platt, N., Warner, S., Heagy, J.F., 2004. Application of user-
oriented MOE to transport and dispersion model predic-
tions of ETEX, Proceedings of the Ninth International
Conference on Harmonisation Within Atmospheric Disper-
sion Modelling for Regulatory Purposes. Garmisch-Parten-
kirchen, Germany, pp. 120–125.
van Dop, H., Nodop, K., (Eds.), 1998. A European tracer
experiment, Atmospheric Environment, 24, 4089–4378.
Warner, S., Platt, N., Heagy, J. F., 2001a. Application of user-
oriented measure of effectiveness to HPAC probabilistic
predictions of Prairie Grass field trials. IDA Paper P-3586,
275pp. (Available electronically [DTIC STINET
ADA391653] or via a request to Steve Warner at
Warner, S., Heagy, J. F., Platt, N., Larson, D., Sugiyama, G.,
Nasstrom, J. S., Foster, K. T., Bradley, S., Bieberbach, G.,
2001b. Evaluation of transport and dispersion models: a
controlled comparison of Hazard Prediction and Assess-
ment Capability (HPAC) and National Atmospheric
Release Advisory Center (NARAC) predictions. IDA Paper
P-3555, 251pp. (Available electronically [DTIC STINET
ADA391555] or via a request to Steve Warner at
Warner, S., Platt, N., Heagy, J. F., 2003. Application of user-
oriented MOE to transport and dispersion model predic-
tions of the European tracer experiment, IDA Paper P-3829,
86pp. (Available electronically [DTIC STINET
ADA419433] or via a request to Steve Warner at
Warner, S., Platt, N., Heagy, J.F., 2004a. User-oriented two-
dimensional measure of effectiveness for the evaluation of
transport and dispersion models. Journal Applied Meteor-
ology 43, 53–73.
Warner, S., Platt, N., Heagy, J.F., 2004b. Application of user-
oriented measure of effectiveness to transport and disper-
sion model predictions of the European tracer experiment.
Atmospheric Environment 38 (39), 6789–6801.
Warner, S., Platt, N., Heagy, J.F., 2004c. Comparisons of
transport and dispersion model predictions of the URBAN
2000 field experiment. Journal Applied Meteorology 43,
829–846.
Warner, S., Platt, N., Heagy, J. F., 2004d. Comparisons of
transport and dispersion model predictions of the European
tracer experiment: area-based and population-based mea-
sures of effectiveness, IDA Paper P-3915, 139pp. (Available
electronically [DTIC STINET ADA427807] or via a request
to Steve Warner at [email protected].)