Comparisons of transport and dispersion model predictions of the european tracer experiment: area- and population-based user-oriented measures of effectiveness

ARTICLE IN PRESS

1352-2310/$ - se

doi:10.1016/j.at

�Correspond

E-mail addr

Atmospheric Environment 39 (2005) 4425–4437

www.elsevier.com/locate/atmosenv

Comparisons of transport and dispersion model predictions ofthe european tracer experiment: area- and population-based

user-oriented measures of effectiveness

Steve Warner�, Nathan Platt, James F. Heagy

Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, Virginia 22311-1882, USA

Received 22 November 2004; accepted 18 February 2005

Abstract

In October 1994, a tracer gas was released from a location in northwestern France and tracked at 168 sampling

locations in 17 countries across Europe. This release, known as the European Tracer Experiment (ETEX), resulted in

the collection of a wealth of data. This paper applies a previously described user-oriented measure of effectiveness

(MOE) methodology to evaluate the predictions of 46 models against the long-range ETEX observations. The paper

extends previous work by computing MOE values that are based on ‘‘true’’ areas (e.g., in square kilometers) and on

actual European population distributions. In this way, assessments of model predictions of ETEX are placed in a

possible operational context. The predictive performance of the models were assessed with nominal, area-, dosage-, and

population-based MOE values and ranked using a few notional user-based scoring functions. This study finds that the

rankings of some models—in particular, several of the model predictions that were ranked in the top 10—are relatively

insensitive to the particular MOE technique used. This robust behavior, with respect to analysis assumptions, is

regarded as an important feature of model performance.

r 2005 Elsevier Ltd. All rights reserved.

Keywords: Model intercomparison; Transport and dispersion; Measure of effectiveness; European tracer experiment (ETEX)

1. Introduction

In October 1994, the inert, environmentally safe tracer

gas perfluoro-methyl-cyclohexane (PMCH) was released

over a 12-h period from a location in northwestern

France and tracked at 168 sampling locations in 17

countries across Europe extending over a thousand

kilometers (Graziani et al., 1998). The authors have

obtained from the Joint Research Centre of the

European Commission (Ispra, Italy) 46 sets of transport

and dispersion predictions associated with models from

e front matter r 2005 Elsevier Ltd. All rights reserve

mosenv.2005.02.058

ing author. Tel.: 703 845 2096; fax: 703 845 6977.

ess: [email protected] (S. Warner).

17 countries as well as the observed PMCH sampling

data associated with the October 1994 European Tracer

Experiment (ETEX) release (Mosca et al., 1998a).

A previously developed user-oriented two-dimen-

sional measure of effectiveness (MOE) was recently

(Warner et al., 2004a) used to evaluate the predictions of

these 46 models against the long-range ETEX observa-

tions (Warner et al., 2004b). The two-dimensional MOE

allows for the evaluation of transport and dispersion

model predictions in terms of false negative (under-

prediction) and false positive (overprediction) regions.

We define the false negative region (AFN) where hazard

is observed but not predicted and the false positive

region (AFP) where hazard is predicted but not observed.

d.

www.elsevier.com/locate/atmosenv

ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374426

The x-axis of this two-dimensional metric corresponds

to the ratio of overlap region (AOV) to observed region

(AOB) and the y-axis corresponds to the ratio of overlap

region to predicted region (APR). These mathematical

definitions can be algebraically rearranged and we then

recognize that the x-axis corresponds to 1 minus the

false negative fraction and the y-axis corresponds to

1 minus the false positive fraction.

MOE ¼ x; yð Þ ¼AOV

AOB;AOV

APR

� �

¼AOV

AOV þ AFN;

AOV

AOV þ AFP

� �

¼ 1 �AFN

AOB; 1 �

AFP

APR

� �. ð1Þ

Importantly, this MOE considers the direction of the

plume or location of the hazard when evaluating a

model’s performance.

The computation of the MOE does not necessarily

require estimated areas and hence, area interpolation. In

a previous study (Warner et al., 2004b and Platt 2004),

no area interpolations were used to compute the MOE

values; rather, computations of the components of the

MOE were based solely on direct comparisons of

predictions and field trial observations paired in space

and time. We refer to MOE values computed in this way

as nominal. This previous effort assessed model perfor-

mance by ranking the computed MOE values for the 46

sets of model predictions by a few notional scoring

functions that were previously developed (Warner et al.,

2003). One question that arose during this previous

study was whether the rankings of the 46 models would

be particularly sensitive to using area-based MOE

values, computed after applying area interpolation,

as opposed to direct sampler observation–prediction

comparisons.

The research reported here extends the previous work

by computing MOE values that are based on ‘‘true’’

areas (e.g., in square kilometers), on estimated dosages,

and on actual European population distributions. First,

MOE values based on actual areas (e.g., square kilo-

meters) are created for both assessments of concentra-

tions and dosages. Then, actual European population

distributions are included to place the MOE in its

ultimate operational context—fraction of the population

falsely warned and fraction of the population inadver-

tently exposed. At this point, the 46 sets of model

predictions can be compared and assessed within this

more operational context.

Previously, area-based MOE values have been com-

puted in two cases. First, as part of an examination

of model predictive performance of the short range

(o 800 m) Prairie Grass field experiment, two interpola-

tion techniques were explored and discussed (Warner et

al., 2001a). In that case, a relatively densely sampled

space was available—545 samplers in a 1 km2 region. It

was found that for this relatively densely sampled

experiment, results based on the nominal and area-

based MOE were similar. Next, area-based MOE values

have been used to compare sets of model predictions of

simulated releases. In that case, area-based MOE values

were estimated by considering very dense grids of model

predictions (Warner et al., 2001b). ETEX did not have a

dense sampling network. However, our interests in

comparing and assessing model predictions of ETEX

are associated with rather large-scale, lower-resolution

features. That is, we focused on determining which

models best capture the general shape of the plume and

its time evolution as opposed to, for example, details of

the concentration time history in a specific European

valley or urban area—which would not be reasonably

possible, in general, with such relatively sparse sampling

(168 samplers across Europe).

2. Brief description of ETEX

The tracer gas PMCH was released over a 12-h period

on 24 October 1994 from a location 35 km west of

Rennes in Brittany, France. Samplers were located at

168 locations across 17 European countries and air

samples were collected every 3 h for a period of 90 h after

the initial release. Background measurements suggested

that a level of 0.01 ng m�3 should be used as the

minimum for all statistical comparisons (Graziani

et al., 1998). Details associated with the conduct of this

experiment, the sampling network, and the time evolu-

tion of the tracer cloud can be found in Girardi et al.

(1998) and van Dop and Nodop (1998). Two years after

the ETEX releases, a modeling exercise known as the

Atmospheric Transport Model Evaluation Study

(ATMES) II (Mosca et al., 1998a) was con-

ducted. ETEX-ATMES II predictions associated with

46 model configurations were provided to the authors by

the Joint Research Center (Ispra, Italy) of the European

Commission. Table 1, extracted from Mosca et al.

(1998a), provides some details associated with these

models.

3. Nominal MOE computations

The components of the MOE—AFN, AFP, and AOV—

were computed directly from the predictions and field

trial observations paired in space and time as in Warner

et al. (2004b) and Warner et al. (2003). Procedures

associated with the computation of nominal MOE

values and recent examples can be found in Warner

et al. (2004a, b, and c). For this analysis, observed and

predicted average concentrations were compared for the

30, 3-h time periods of PMCH monitoring during

ARTICLE IN PRESS

Table 1

ATMES II participants for which the authors obtained predictions

Model Acronym Participant Nationality

101 IMP Institute of Meteorology and Physics, University of Wien Austria

102 BMRC Bureau of Meteorology Research Centre Australia

103 NIMH-BG National Institute of Meteorology and Hydrology Bulgaria

104 NIMH-BG National Institute of Meteorology and Hydrology Bulgaria

105 CMC Canadian Meteorology Centre Canada

106 DWD German Weather Service Germany


108 NERI Nat. Environment Research Inst./Risoe Nat. Lab./Univ. of Cologne Germany/Denmark


110 DMI Danish Meteorological Institute Denmark

111 IPSN French Institute for Nuclear Protection and Safety France

112 EDF French Electricity France

113 ANPA National Agency for Environment Italy

114 CNR National Research Council Italy

115 JAERI Japan Atomic Research Institute Japan

116 MRI Meteorological Research Institute Japan

117 NIMH-R National Institute of Meteorology and Hydrology Romania

118 FOA Defense Research Establishment Sweden

119 MetOff Meteorological Office United Kingdom

120 NOAA National Oceanic and Atmospheric Administration United States

121 SCIPUFF ARAP Group of Titan Research and Technology United States

122 KMI Royal Institute of Meteorology of Belgium Belgium

123 Meteo Meteo France France

127 LLNL Lawrence Livermore National Laboratories United States

128 SMHI Swedish Meteorological and Hydrological Institute Sweden

129 SAIC Science Applications International Corporation United States

130 IMS Swiss Meteorological Institute Switzerland

131 DNMI Norwegian Meteorological Institute Norway

132 SRS Westinghouse Savannah River Laboratory United States

133 JMA Japan Meteorological Agency Japan

134 JMA Japan Meteorological Agency Japan

135 MSC-E Meteorological Synthesizing Centre—East Russia

201 BMRC Bureau of Meteorology Research Centre Australia

202 CMC Canadian Meteorological Centre Canada



205 DMI Danish Metrological Institute Denmark

206 Meteo Meteo France France

207 MRI Meteorological Research Institute Japan

208 SMHI Swedish Meteorological and Hydrological Institute Sweden



211 NOAA National Oceanic and Atmospheric Administration United States

212 NIMH-R National Institute of Meteorology and Hydrology Romania

213 DNMI Norwegian Meteorological Institute Norway

214 MSC-E Meteorological Synthesizing Centre—East Russia

The model predictions denoted with a number between 101 and 135, the ‘‘100 series,’’ used European Centre for Medium Range

Weather Forecasts (ECMWF) analyzed meteorological data as input. The ‘‘200 series’’ (201–214) used weather inputs produced by

independent numerical weather prediction models different from ECMWF’s.

S. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4427

ETEX. MOE values based on concentration thresholds

of 0.01, 0.10, and 0.50 ng m�3 were examined in this

study consistent with the previous studies of Mosca et al.

(1998b) and Boybeyi et al. (2001).

4. Area-based concentration MOE computations

Given values at a discrete (perhaps irregular) set

of samplers, the process of interpolation provides

ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374428

intermediate values on some regular grid of points. The

resulting regular grid of functional values could be used

to obtain contours of ‘‘hazard’’ areas (areas within a

critical threshold contour) or calculate MOE values

based on interpolated areas. For example, the MOE

components—AFN, AFP, and AOV—of an area-based

concentration MOE would have units of square kilo-

meters if assessed for a threshold value or concentra-

tion� area if assessed based on summed concentrations

(e.g., marginal differences for AFP and AFN).

The Delaunay triangulation procedure is useful for

the interpolation, analysis, and visual display of

irregularly, discretely gridded data (Guibas et al.,

1992). From a set of discrete points (sampler coordi-

nates), a planar triangulation is formed, satisfying the

property that the circumscribed circle of any triangle in

the triangulation contains no other vertices in its

interior. For any point that is within some triangle

(formed via Delaunay triangulation), a linear interpola-

tion routine using values at the vertices of the triangle is

used to calculate the value at that point. For the

interpolations reported in this paper, the data (observa-

tions and predictions) were first transformed logarith-

mically because actual plume concentrations or dosages

varied over orders of magnitude. The above routine was

applied with a resolution of 2 km� 2 km corresponding

to 1001� 1001 grid points. The example displays

reported in Fig. 1 are based on the logarithmic

transformation of the observed data followed by

Delaunay triangulation and linear interpolation as

described above.

The adopted procedure, while simple and yielding

some perhaps less visually pleasing sharp edges,

appeared to be robust and necessarily maintains the

actual observed values at the sampler locations—this

would not be true for many fitting procedures. A few

other interpolation schemes (for example, skipping the

logarithmic transformation of the data step from the

above procedure) and area-based weighting schemes

were also examined as a part of this study. Some of the

examined techniques correspond to interpolated plumes

that are likely conservative by construction, i.e., over-

estimating areas and people who would be encountered

by either the observed or predicted plume. Relative

model performance rankings based on scoring functions

applied to area-based MOE values were compared for

these different techniques. It was concluded that model

performance rankings were relatively robust for several

different techniques. Therefore, in this paper we focus

on only one—arguably the simplest—interpolation

procedure. Additional details associated with other

interpolation and weighting schemes that were examined

are described in Warner et al. (2003) and Warner et al.

(2004d).

At this point then, MOE values can be computed in a

manner analogous to the procedure for the nominal

MOE. However, in this case, the MOE components—

AOV, AFN, and AFP—are computed directly from the

consideration of the 1001� 1001 discrete grid points

instead of the 168 sampling locations.

It must be noted that interpolating between sampler

locations cannot capture ‘‘peaks’’ or ‘‘holes’’ in the

concentration distribution that may lie between sam-

plers. For densely sampled regions this would not be a

problem. However, for situations where, for example,

complex terrain or a highly urbanized environment lies

between perhaps sparse sampler locations, one might

expect considerable variations in the concentrations as a

function of time and location. Over the long distances

associated with ETEX, it is reasonable to expect that the

locations of any holes or peaks may shift in time and

ultimately be mitigated by dispersive effects. Further-

more, in the next section, the computation of dosage-

based MOE values is described. These dosage-based

values consider the summation of thirty 3-h time

periods. The summation process should reduce the

likelihood of large unexpected variations (peaks or

holes) between sampler locations by smoothing out

temporal differences that may be evident at the 3-h time

resolution.

5. Area-based dosage MOE computations

To create ‘‘observed’’ dosages at given locations, one

sums the concentrations at each sampler location. For

example, if a 3-h average concentration of 0.01 ng m�3

were observed for 12 h (720 min) at a given location, a

dosage of 7.2 ng min m�3 would be computed for that

site. However, periods of time in which sampler data

could not be (or were not) collected exist for many of the

sampler locations. If there were only a few of these

missing points, one could simply remove them (along

with the corresponding prediction) from the analysis

and compute dosages for locations that had continuous

coverage. For the ETEX release, however, there are

many locations that have at least some missing time

periods. Therefore, one must fill in these values in some

manner in order to create a dosage. The spatial

interpolation described previously for the area-based

MOE values provides a natural way to fill in the

temporal holes in the observed concentration data. Since

predictions exist (in general) at all time periods,

predicted dosages can be created by direct summation

of the predicted concentrations. For the few model

cases, where predictions were missing for some samplers

and at some time periods, the corresponding observation

was removed from the calculation. This procedure leads

to area-based, dosage MOE values.

For this analysis, three threshold dosages were

examined: 7.2, 72, and 360 ng min m�3. These three

values can be related to the 3-h average concentration

ARTICLE IN PRESS

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20 24

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

44

48

52

56

60

Degrees Longitude

Deg

rees

Lat

itud

e

6

24 30

42

60

78 84 90

7266

48 54

36

12 18

Fig. 1. Observed PMCH concentrations across Europe. Plots display contours from 6 h after the release for the upper left plot to 90 h

after the release for the lower right plot in increments of 6 h. Contours are 0.01, 0.1, and 0.5 ng m�3. Bold numbers on individual plots

correspond to the last hour of the given 6-h period. Interpolation accomplished with a resolution of 1001� 1001 grid points (Warner et

al., 2003).


ARTICLE IN PRESS

1500

1000

500

0

-500

km

S. Warner et al. / Atmospheric Environment 39 (2005) 4425–44374430

thresholds of 0.01, 0.1, and 0.5 ng m�3 (previously

discussed), respectively, by considering a 12-h

(720 min) period in which the cloud might pass over

any individual sampler location, i.e, 720 min-

0.01 ng m�3¼ 7.2 ng min m�3; 720� 0.1 ¼ 72 ng min

m�3; and 720� 0.5 ¼ 360 ng min m�3. The MOE com-

ponents—AFN, AFP, and AOV—of an area-based dosage

MOE would have units of square kilometers if assessed

for a threshold value or dosage� area if assessed based

on summed dosages.

-1000

-1500-2000 -1000 0 1000 2000

km

Fig. 2. Illustration of European population distribution:

numbers in legend correspond to population within

2 km� 2 km cell.

6. Population-based MOE computations

Dosage-based MOE values can be converted into

population-based values by including the underlying

non-uniform European population distribution. Fig. 2

illustrates the population distribution that was used

(LANDSCAN 2000). The population distribution

shown below is represented by population values at

about 2.1 million grid cells; 1501 in the x direction

(‘‘east–west’’) and 1401 in the y direction (‘‘north–

south’’). This results in a grid cell size of 2 km� 2 km.

The overall European population represented here is

about 500 million.

We define D(i,j) as the dosage associated with the

2 km� 2 km grid cell (i,j), pi,j as the associated popula-

tion, and TD as the dosage threshold of interest. Then,

we identify OVD (i), FND (i), and FPD (i) as follows

OVD ði; jÞ ¼1 if observed Dði; jÞ � TD and predicted Dði; jÞ � TD;

0 otherwise;

(

FND ði; jÞ ¼1 if observed Dði; jÞ � TD and predicted Dði; jÞoTD;

0 otherwise;

(

FPD ði; jÞ ¼1 if observed Dði; jÞoTD and predicted Dði; jÞ � TD;

0 otherwise:

�

(2)

Summing for all grid cells and including the population

weights (pi,j) leads to values for AOV, AFN, AFP that are

based on the European population:

AOV ¼PNi

i¼1

PNj

j¼i

pi;j � OVDði; jÞ� �

AFN ¼PNi

i¼1

PNj

j¼i

pi;j � FNDði; jÞ� �

;

AFP ¼PNi

i¼1

PNj

j¼i

pi;j � FPDði; jÞ� �

:

(3)

The MOE components computed in this way have units

of people when assessed for a threshold value. At this

point then, for a given threshold, the MOE values can be

expressed, with the x-axis labeled ‘‘one minus the

fraction of the population inadvertently exposed’’ and

the y-axis labeled ‘‘one minus the fraction of the

population unnecessarily warned’’ – i.e., population-

based MOE values.

7. Scoring functions

Several notional scoring functions for the MOE space

have been previously described including their algebraic

derivations (Warner et al., 2003, 2004a, b). These

scoring functions can be thought of as corresponding

to the requirements of different possible model users.

Such scoring functions can thus aid in assessing if a

model’s MOE value, for a given set of field observations,

is ‘‘good enough.’’ For a given application and user risk

tolerance, certain regions of the two-dimensional MOE

space may be considered acceptable. For example, some

users may tolerate a certain false positive fraction

(ultimately, unnecessarily warned individuals) but re-

quire a very low false negative fraction (inadvertently

exposed individuals). Such a risk tolerance profile

implies a certain region in the two-dimensional MOE

space, which can be turned into a mathematical function

for ‘‘scoring’’ the MOE predictions.

Table 2 summarizes some useful scoring functions

associated with assessments of MOE values. The

objective scoring function (OSF) allows for the assess-

ment of which model prediction achieves an MOE value

‘‘closest’’ to the perfect value of (1,1)—complete overlap

occurs at OSF ¼ 0. The absolute fractional bias—

ABS(FB)—scoring function provides for an evaluation

of which model best predicted the overall amount of

material, or in the case of a threshold based MOE,

which model best predicted the overall size of the area

that was affected by material above a defined threshold.

ABS(FB) ¼ 0 implies no bias (e.g., neither an over- nor

underprediction). The normalized absolute difference

ARTICLE IN PRESS

Table 2

Algebraic relationships between scoring functions and the two-dimensional MOE

Scoring function Relationship to MOE (x,y)

Objective Scoring Function—OSF OSF ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix2 þ y2

pAbsolute Fractional Bias—ABS(FB) ABSðFBÞ ¼ 2 x�yð Þ

xþy

Normalized Absolute Difference—NAD NAD ¼

xþy�2xyxþy

Figure of Merit in Space—FMS FMS ¼xy

xþy�xy

Risk-Weighted FMS—RWFMS(5, 0.5) RWFMSð5; 0:5Þ ¼ xyxyþ5y 1�xð Þþ0:5x 1�yð Þ


(NAD) scoring function, which measures scatter, leads

to assessments of which model predictions demonstrated

the smallest differences between observations and

predictions. NAD ¼ 0 implies no differences—a perfect

prediction.

Recent studies of model predictions of ETEX have

included the use of a figure of merit in space (FMS),

defined as the overlap area between the prediction and

observation divided by the total predicted and observed

areas, all above some threshold concentration (Klug

et al., 1992; Mosca et al., 1998b; Boybeyi et al., 2001).

Table 2 describes the algebraic relationship between

FMS and MOE. There is a strictly monotonic relation-

ship between NAD and FMS (Warner et al., 2003)

implying that scoring model performance based

on NAD or FMS will necessarily lead to identical

rank orderings. A value of FMS ¼ 1 implies complete

overlap.

Some users of hazardous material transport and

dispersion models might consider false positives and

false negatives quite differently. For many applications,

false positives would be much more acceptable to the

user than false negatives (which could result in decisions

that directly lead to death or injury). Eq. (4) is an

example of a user scoring function that takes the above

risk tolerance into consideration. We refer to this

notional user scoring function as the Risk-Weighted

FMS (RWFMS):

RWFMS ¼xy

xy þ CFNy 1 � xð Þ þ CFPx 1 � yð Þ(4)

where CFN, CFP40. This equation describes a modified

FMS that includes coefficients, CFN and CFP, to weight

the false negative and false positive regions, respec-

tively—denoted RWFMS(CFN, CFP). Therefore,

RWFMS(1,1) ¼ FMS. For this study, RWFMS(5, 0.5)

was used as a scoring function, implying that false

negative contributions to the MOE are considered 5

times more important than overlap and 10 times as

important as false positive contributions—suggesting a

conservative user and application.

8. Results of MOE comparisons

Table 3 compares nominal (NOM), area-based con-

centration (ABConc), area-based dosage (ABDos), and

population-based (POP) MOE rankings for the FMS

and for a threshold of 0.1 ng m�3. The model numbers

shown in Table 3 correspond to the ATMES II

predictions defined in Table 1. Warner et al. (2004d)

provides tables for each of the three scoring functions,

all of the examined thresholds, and for summed

concentration and dosage-based MOE values. The

FMS-based rankings, as shown in Table 3, appear

relatively robust to the different MOE techniques used

to compute the MOE values. Similarly, of the top 5

OSF-ranked models based on the nominal MOE, 4, 4,

and 4 appear in the top 5 based on the ABConc, ABDos,

and POP MOE values, respectively. Similarly, of the

bottom 5 OSF-ranked models based on the nominal

MOE, 4, 5, and 5 appear in the bottom 5 based on the

ABConc, ABDos, and POP MOE values, respectively.

Fig. 3 presents histograms of the changes in OSF,

NAD, and RWFMS(5, 0.5) rankings that result from

subtracting ABConc summed concentration MOE

rankings from the nominal summed concentration

MOE rankings. A positive difference of 5, for example,

implies that the associated model improved 5 rankings

when the ABConc MOE technique was applied in place

of the nominal procedure. The biggest changes are

associated with very large improved rankings for models

118, 121, and 127 when the ABConc MOE procedure

was followed. These three models were ranked as 20, 41,

and 33, respectively, based on the nominal MOE and

OSF. For the ABConc MOE, models 118, 121, and 127

are ranked as 6, 28, and 1, respectively, improving by 14,

13, and 32 rankings. Similar improvements in rankings

are seen for models 121 and 127 for the NAD and

RWFMS(5, 0.5) scoring functions. These improved

rankings mirror changes seen for these same models in

a previous study (Warner et al., 2004b) when the single

‘‘near-release’’ sampler at Rennes was removed. As

reported in Warner et al. (2003), the OSF-based

ARTICLE IN PRESS

Table 3

Comparisons of model rankings based on the FMS for NOM, ABConc, ABDos, and POP based MOE values

Rank Model NOM Model ABConc Model ABDos Model POP

1 208 0.577 208 0.524 105 0.729 105 0.823

2 128 0.551 105 0.503 208 0.721 127 0.806

3 202 0.545 202 0.498 127 0.713 208 0.786

4 101 0.544 128 0.483 202 0.685 202 0.777

5 127 0.526 127 0.476 134 0.664 106 0.761

6 107 0.521 114 0.444 128 0.658 128 0.756

7 105 0.517 101 0.441 119 0.656 101 0.748

8 131 0.508 210 0.425 113 0.648 110 0.747

9 118 0.497 209 0.421 210 0.640 104 0.746

10 115 0.493 131 0.413 111 0.623 119 0.741

11 205 0.487 106 0.411 121 0.623 205 0.736

12 134 0.484 113 0.410 106 0.623 121 0.735

13 106 0.482 115 0.399 209 0.621 134 0.734

14 210 0.481 107 0.394 131 0.608 111 0.726

15 114 0.478 213 0.392 205 0.608 131 0.722

16 111 0.473 111 0.389 101 0.605 113 0.717

17 209 0.470 118 0.384 114 0.603 112 0.714

18 204 0.467 205 0.374 110 0.587 204 0.713

19 213 0.460 119 0.374 213 0.581 107 0.706

20 110 0.456 204 0.355 115 0.578 132 0.704

21 133 0.455 207 0.354 104 0.578 115 0.703

22 119 0.439 110 0.350 107 0.565 210 0.699

23 113 0.428 133 0.348 204 0.551 123 0.686

24 207 0.426 134 0.345 123 0.547 103 0.682

25 102 0.423 102 0.331 118 0.545 209 0.679

26 123 0.421 123 0.330 207 0.536 206 0.669

27 201 0.403 203 0.313 102 0.529 213 0.668

28 203 0.399 201 0.310 112 0.521 114 0.664

29 211 0.396 108 0.310 108 0.519 207 0.661

30 121 0.379 121 0.303 135 0.513 203 0.637

31 108 0.378 122 0.292 132 0.510 109 0.627

32 122 0.368 103 0.290 103 0.506 118 0.620

33 104 0.365 135 0.285 206 0.505 211 0.617

34 120 0.354 116 0.285 133 0.504 102 0.616

35 103 0.351 109 0.280 116 0.498 116 0.605

36 135 0.345 211 0.277 203 0.493 108 0.602

37 116 0.341 104 0.265 109 0.478 201 0.601

38 109 0.336 120 0.236 211 0.476 133 0.592

39 112 0.306 112 0.231 201 0.447 135 0.592

40 132 0.270 206 0.216 122 0.427 122 0.558

41 206 0.269 214 0.186 120 0.417 120 0.543

42 214 0.263 132 0.178 214 0.406 214 0.513

43 129 0.190 129 0.093 130 0.345 130 0.381

44 130 0.129 130 0.073 129 0.251 129 0.267

45 117 0.125 117 0.030 117 0.100 117 0.170

46 212 0.060 212 0.014 212 0.037 212 0.070

These comparisons were based on considering a concentration threshold of 0.1 ng m�3 for the NOM and ABConc MOE values and a

comparable dosage threshold of 72 ng min m�3 for the ABDos and POP MOE values.


rankings for summed concentration MOE values of

models 118 (FOA), 121 (SCIPUFF), and 127 (ARAC)

improved 16, 25, and 7, respectively, after the removal of

the single Rennes sampler location. Therefore, the

previously described sensitivity of model rankings to a

single sampler location appears to be mitigated by the

ARTICLE IN PRESS

0

2

4

6

8

10

12

-20 -8 -4 0 4 8 12 16 20Binned Ranking Differences

Difference = 4

127121

Based on

Rankings

0

2

4

6

8

10

12

16 -12 -8 -4 0 4 8 12 16 20Binned Ranking Differences

Fre

quen

cy

012345678

-20 -16 -12 -8 -4 0 4 8 12 16 20Binned Ranking Differences

118

Based on NADRankings

-16 -12

Fre

quen

cy

Median Absolute Difference = 4

127121

Based on RWFMS(5,0.5)

Rankings

Median Absolute Difference = 3

127118 , 121

Based on OSF Rankings

Difference = 4.5

118


Fre

quen

cy

Median Absolute

121, 127

118


-20(a)

(b)

(c)

Fig. 3. Histograms of differences in rankings between NOM

and ABConc procedures: (a) OSF rankings, (b) NAD rankings,

and (c) RWFMS(5,0.5) rankings.


ABConc MOE procedure (which begins with a loga-

rithmic transformation followed by linear interpola-

tion).

Fig. 3 also reports the median absolute differences—

between NOM and ABConc—in rankings for the 46 sets

of predictions of the summed concentration. Median

values of 3, 4.5, and 4 are reported for OSF, NAD, and

RWFMS(5, 0.5) rankings, respectively. For perspective,

the random ordering of 46 entities was simulated and we

found the median absolute ranking difference was 13.5,

suggesting relatively robust behavior of the model

rankings shown in Table 3. Table 4 presents median

absolute differences between different MOE proce-

dures—NOM, ABConc, ABDos, and POP—and for

three different threshold levels.

Several results can be obtained from Table 4. First,

differences in model rankings are greatest when compar-

ing concentration-based and dosage-based MOE values.

The middle four columns (3 though 6) compare ranking

differences for concentration-based and dosage-based

MOE values and result in median ranking differences

between 3 and 7 with a median of the medians of 4. For

comparison, median ranking differences for NOM

versus ABConc (column 2), which examine differences

due to basing the concentration MOE on areas, were

between 1.5 and 2. Similarly, median ranking differences

for ABDos versus POP (column 7), which examine

differences due to basing the dosage MOE on actual

European population distributions, were between 1 and

4. No strong trends in the magnitude of the absolute

ranking differences between different MOE computa-

tional techniques could be assigned to increases in the

threshold.

A few models improve their relative rankings greatly

when assessed based on dosage MOE values (ABDos)

instead of concentration-based values (ABConc). For

example, for OSF rankings and the three comparative

concentration/dosage thresholds (Low, Medium, and

High), model 121 (SCIPUFF), moves up 26 (from 33 to

7), 20 (from 31 to 11), and 10 (from 25 to 15) positions,

respectively. Examination of 3-h concentration and

cumulative dosage plots for the observations and

predictions suggest that some models do not match the

3-hour timing (e.g., time of arrival and dwell) as well as

others. Therefore, while total dosages may be well

predicted, 3-h average concentrations that require both

the location and time to be matched may be predicted

worse (relative to the other models). This certainly

appears to be the case for the SCIPUFF predictions.

Fig. 4 shows contours associated with the SCIPUFF

3-h average concentration predictions and the corre-

sponding observations for the period of time starting

one day after the release. The figure indicates that the

SCIPUFF predictions seemed to ‘‘run ahead’’ of the

observations for the period of time after about 42 h.

Such a mismatch in timing would be expected to greatly

degrade 3-h average concentration MOE values (and

hence rankings) for SCIPUFF. However, summing these

concentrations over all time periods to create dosage

MOE values results in improved relative performance.

Table 5 lists the models that were in the top 5 or top

10 of 46 under more than one computational procedure.

This table describes results for three scoring functions

and three threshold levels—9 rows. The second column

identifies those model predictions that were always in the

top 5/top 10 for all techniques—NOM, ABConc,

ABDos, and POP. The third column identifies those

model predictions that were always in the top 5/top 10

for the concentration-based procedures. The final

column identifies those model predictions that were

always in the top 5/top 10 for all techniques that were

ARTICLE IN PRESS

Table 4

Median absolute ranking difference between various MOE value (NOM, ABConc, ABDos, and POP) rankings for 3 thresholds and

for 3 scoring functions

Threshold, scoring function NOM–ABConc NOM–ABDos NOM–POP ABConc–ABDos ABConc–POP ABDos–POP

Low, OSF 2 4 5 4 5 3

Low, FMS or NAD 1.5 3 4 5 4 4

Low, RWFMS(5,0.5) 2 5.5 4 4 4 1

Medium, OSF 2 5 6 3 4 4

Medium, FMS or NAD 2 5 7 3.5 6 4

Medium, RWFMS(5,0.5) 2 3.5 4 3 3 1

High, OSF 2 5 5 4 5.5 3

High, FMS or NAD 2 4 5 3 6 3.5

High, RWFMS(5,0.5) 2 5 5 4 6 2.5

For thresholds, ‘‘Low’’ implies 0.01 ng m�3 and 7.2 ng min m�3 for concentration and dosage measures, respectively; ‘‘Medium’’

implies 0.1 ng m�3 and 72 ng min m�3 for concentration and dosage measures, respectively; and ‘‘High’’ implies 0.5 ng m�3 and

360 ng min m�3 for concentration and dosage measures, respectively.


dosage-based. Therefore, Table 5 illustrates the models

that had robust performance (always top 5 or top 10)

with respect to the MOE computational technique. For

example, for the lowest thresholds considered, models

105 (CMC), 127 (LLNL), and 208 (SMHI) have the

most robust ‘‘top 5/top 10’’ performance for the OSF

and FMS (or NAD) scoring functions. Considering all

three thresholds, and the OSF and FMS/NAD scoring

functions, only one prediction, model 127, is always in

the top 10. For the more conservative RWFMS(5, 0.5)

scoring function, model 110 (DMI) resulted in the most

robust performance appearing in the top 10 at all three

threshold levels. For the dosage-based rankings (ABD

and POP), only one model, 127, appears in the top 10 in

all cases. Top 10 performance that spans both nominal

and conservative scoring functions, at any threshold,

appears rare, with models 105 (CMC with ECMWF),

127, and 202 (CMC) being the only ones to achieve such

a result.

Finally, model rankings obtained in this work were

briefly compared with the rankings described in Mosca

et al. (1998a, Table 168). The rankings discussed in this

previous study include the Total Rank obtained by

summing (with equal weights) individual rankings from

a variety of individual rankings based on statistical

measures applied to observations and predictions paired

in space and time. Several caveats apply to the

discussions that follow. First, there are likely differences

with the protocols that were used to process ATMES II

data (both predictions and observations) between

Fig. 4. Contours (based on Delaunay triangulation technique descr

SCIPUFF (model 121) predictions for the time periods between 24 and

the SCIPUFF predictions (black ¼ 0.5 ng m�3, dark blue ¼ 0.1 ng m

correspond to ‘‘observed’’ areas above the 3-h concentration thr

0.01 ng m�3). Numbers on individual plots correspond to the last hou

groups. Next, statistical quantities that underscore

rankings based in this work are not identical to the

statistical measures described in Mosca et al. (1998a).

Table 6 lists those models that were ranked within the

top 10 in the previous study as well as in this study by

each of the four MOE-based techniques—NOM,

ABConc, ABDos, and POP. In each case, six models

appear in the top 10 in both the previous study and in

the current study. In addition, four models—101 (IMP),

107 (DWD), 111 (IPSN), and 209 (MetOff)—appear in

the top 10 in all cases reported here as well as in the

previous study’s Total Rank. Interestingly, some of the

most robust predictions described earlier, for example,

model 127, do not appear in Table 6. Undoubtedly, one

cause of this was the poor performance of some models,

including model 127, associated with the closest sampler

location at Rennes, France. This close-in overprediction

has been previously discussed (Warner et al., 2004b) and

its impact on overall MOE assessments is somewhat

(and perhaps appropriately) mitigated by employing

interpolation procedures, i.e., ABConc, ABDos, and

POP.

9. Conclusions and discussion

The techniques and procedures described in this paper

provide a mechanism to assess model predictive

performance in a way that allows relative over- and

underpredictions to be evaluated simultaneously. After

ibed in text) for 3-h average concentration observations and

75 h after the release. The solid lines correspond to contours for�3, and lighter blue ¼ 0.01 ng m�3) and the shaded regions

esholds (red ¼ 0.5 ng m�3, orange ¼ 0.1 ng m�3, and yellow-

r of the given 3-h time period.

ARTICLE IN PRESS

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

East-West

No

rth

-So

uth

1000

500

0

-500

-10001000-500 0 500-1000

km

km

1000

500

0

-500

-10001000-500 0 500-1000

km

km

Obs VS Pred, Time = 24 hr Obs VS Pred, Time = 27 hr Obs VS Pred, Time = 30 hr






24 27 30

393633

42 45 48

575451

60 63 66

757269


ARTICLE IN PRESS

Table 5

Robust top 5/top 10 ranked models for 3 thresholds and for 3 scoring functions

Threshold, scoring

function

All NOM and ABConc

(concentration-based)

ABDos and POP (dosage-based)

Low, OSF 105, 208/127 105, 127, 202, 208/101, 114, 128,

210

105, 208/111, 121, 127, 131, 134

Low, FMS or NAD 105, 208/127 105, 127, 202, 208/101, 114, 128,

210

105, 208/111, 121, 127, 131, 134

Low, RWFMS(5,0.5) 113/101, 110, 123, 205 101, 113, 114/110, 115, 123, 205 104, 110, 113, 205/101, 112, 123,

127, 208

Medium, OSF 127, 202, 208/105, 128 127, 128, 202, 208/101, 105 105, 127, 202, 208/111, 119, 128,

134

Medium, FMS or NAD 127, 202, 208/105, 128 127, 128, 202, 208/101, 105 105, 127, 202, 208/119, 128

Medium, RWFMS(5,0.5) 101, 110, 205/105, 123, 202 101, 110, 123, 205/105, 106, 202 101, 104, 110, 127, 205/105, 123,

202

High, OSF 127/107, 128 107, 127, 128, 134/111, 205, 208 105, 127/107, 113, 128, 202

High, FMS or NAD 127/107, 128 107, 111, 127, 134/118, 128, 133,

208, 209

105, 127/107, 128, 131, 202

High, RWFMS(5,0.5) 105, 110/106, 127, 202, 205 105, 110, 128, 208/106, 107, 127,

134, 202, 205

105, 106, 110, 202/113, 127, 205

Table 6

Models ranked in the top 10 for Mosca et al (1998a) ‘‘Total

Rank’’ and summed concentration MOE OSF-rankings for four

techniques: NOM, ABConc, ABDos, and POP

NOM ABConc ABDos POP

101 (IMP) 101 (IMP) 101 (IMP) 101 (IMP)

107 (DWD) 107 (DWD) 107 (DWD) 107 (DWD)

111 (IPSN) 111 (IPSN) 111 (IPSN) 111 (IPSN)

114 (CNR) 208 (SMHI) 131 (DNMI) 115 (JAERI)

115 (JAERI) 209 (MetOff) 209 (MetOff) 131 (DNMI)

209 (MetOff) 210 (MetOff) 210 (MetOff) 209 (MetOff)

Models are ordered by numerical designator, not specified

ranking.


applying a straightforward interpolation method and

considering the underlying European population dis-

tribution, assessments of model performance in terms of

the fraction of the population falsely warned and

fraction of the population inadvertently exposed were

possible. For several sets of model predictions, perfor-

mance was found to be robust to the MOE computa-

tional technique used—for example, some of the same

models appeared in the top 10 over and over again.

Finally, two important caveats must be noted. First,

the rankings described in this paper result from

consideration of a single release and general inference

about which model is ‘‘best’’ or ranked highest is not

appropriate. Rather, these rankings describe perfor-

mance in terms of this specific release only. In addition,

for this single release field experiment, no direct

measures of uncertainty associated with the computed

MOE values or model rankings were constructed.

Previous studies that have examined multiple releases

have described techniques for assessing uncertainties

and comparing metrics to identify statistically significant

differences (Warner et al., 2004c).

Up to this point, the use of area-based and popula-

tion-based MOE values to compare sets of model

predictions, rank the models, and provide insight into

relative model performance has been emphasized. With

respect to the population-based two-dimensional (i.e., x

and y axes) MOE values, the x-axis corresponds to one

minus the fraction of the (exposed) population that is

inadvertently exposed (i.e., ‘‘not warned’’) to a threshold

level of interest and the y-axis corresponds to one minus

the fraction of the (warned) population that is un-

necessarily warned (at a threshold level of interest). One

might imagine using an effects (or lethality) model to

compute, via minimal extension of the MOE, the actual

number of people ‘‘falsely warned’’ or ‘‘inadvertently

exposed’’ as in Warner et al. (2004a). However, one

must be careful because of the relatively small number of

samplers associated with the observed ETEX data. In

attempting to describe the actual number of affected

people, one would need to rely on the absolute (actual)

areas computed, not simply the fraction of areas. In such

a case, the estimated area sizes are sensitive to the details

associated with the specific area-based technique used to

interpolate between sparse observations (Warner et al.,

2004d, p3–29). This sensitivity is solely based on the

limited availability of experimental observations. For

example, during ETEX, data was collected at sparsely

distributed sampler locations. However, in general,

predictions produce plumes on regular finely spaced

ARTICLE IN PRESSS. Warner et al. / Atmospheric Environment 39 (2005) 4425–4437 4437

grids and do not necessarily require any interpolation.

Thus, one might envision the following operational

procedure to assess actual ‘‘areas’’ or ‘‘numbers of

people affected.’’ First, models are compared to

observations using data at the samplers, both observed

and predicted, as was done in this study. Next, one

chooses a model prediction that demonstrates robust

and acceptable performance—that is, ‘‘top ten’’ perfor-

mance—using the two-dimensional MOE to assess

relative false positive and false negative regions. Then,

the corresponding ‘‘robust model’’ predicted plume

could be used to calculate actual areas and/or numbers

of people affected. We did not have access to the

predicted plumes for this study, only the predicted

concentrations at the sampler locations and thus could

not further explore this procedure.

Acknowledgements

The authors thank Stefano Galmarini (Joint Research

Centre—Environment Institute, Environment Monitor-

ing Unit, Ispra, Italy) for providing access to the

ATMES II model predictions and for useful discussions.

This effort was supported by the Defense Threat

Reduction Agency, with Mr. Richard Fry as project

monitor, and the Central Research Program of the

Institute for Defense Analyses. The views expressed in

this paper are solely those of the authors.

References

Boybeyi, Z., Ahmad, N., Bacon, D.P., Dunn, T.J., Hall, M.S.,

Lee, P.C.S., Sarma, R.A., Wait, T.R., 2001. Evaluation of

the operational multiscale environment model with grid

adaptivity against the European tracer experiment. Journal

Applied Meteorology 40, 1541–1558.

Girardi, F., Graziani, G., van Veltzen, D., Galmarini, S.,

Mosca, S., Bianconi, R., Bellasio, R., Klug, W. (Eds.), 1998.

The ETEX project. EUR Report 181-43 EN. Office of

official publications of the European Communities, Lux-

embourg, p. 108pp.

Graziani, G., Klug, W., Mosca, S., 1998. Real-time long-range

dispersion model evaluation of the ETEX first release. Joint

Research Center of the European Commission, Office of

official publications of the European Communities, L-2985

(CL-NA-17754-EN-C), Luxembourg 216pp.

Guibas, L.J., Knuth, D.E., Sharir, M., 1992. Randomized

incremental construction if Delaunay and Voronoi dia-

grams. Algorithmica 7, 381–413.

Klug, W., Graziani, G., Grippa, G., Pierce, D., Tassone, C.

(Eds.), 1992. Evaluation of long-range atmospheric trans-

port models using environmental radioactivity data from

the Chernobyl accident EUR Report 14147 EN. Office of

official publications of the European Communities, Lux-

embourg 366pp.

Mosca, S., Bianconi, R., Bellasio, R., Graziani, G., Klug, W.,

1998a. ATMES II—Evaluation of long-range dispersion

models using data of the 1st ETEX release. Joint Research

Center of the European Commission, Office of official

publications of the European Communities, L-2985 (CL-

NA-17756-EN-C), Luxembourg 608pp.

Mosca, S., Graziani, G., Klug, W., Bellasio, R., Bianconi, R.,

1998b. A statistical methodology for the evaluation of long-

range dispersion models: an application to the ETEX

exercise. Atmospheric Environment 32 (24), 4307–4324.

Platt, N., Warner, S., Heagy, J.F., 2004. Application of user-

oriented MOE to transport and dispersion model predic-

tions of ETEX, Proceedings of the Ninth International

Conference on Harmonisation Within Atmospheric Disper-

sion Modelling for Regulatory Purposes. Garmisch-Parten-

kirchen, Germany, pp. 120–125.

van Dop, H., Nodop, K., (Eds.), 1998. A European tracer

experiment, Atmospheric Environment, 24, 4089–4378.

Warner, S., Platt, N., Heagy, J. F., 2001a. Application of user-

oriented measure of effectiveness to HPAC probabilistic

predictions of Prairie Grass field trials. IDA Paper P-3586,

275pp. (Available electronically [DTIC STINET

ADA391653] or via a request to Steve Warner at

[email protected].)

Warner, S., Heagy, J. F., Platt, N., Larson, D., Sugiyama, G.,

Nasstrom, J. S., Foster, K. T., Bradley, S., Bieberbach, G.,

2001b. Evaluation of transport and dispersion models: a

controlled comparison of Hazard Prediction and Assess-

ment Capability (HPAC) and National Atmospheric

Release Advisory Center (NARAC) predictions. IDA Paper

P-3555, 251pp. (Available electronically [DTIC STINET


[email protected].)

Warner, S., Platt, N., Heagy, J. F., 2003. Application of user-

oriented MOE to transport and dispersion model predic-

tions of the European tracer experiment, IDA Paper P-3829,

86pp. (Available electronically [DTIC STINET


[email protected].)

Warner, S., Platt, N., Heagy, J.F., 2004a. User-oriented two-

dimensional measure of effectiveness for the evaluation of

transport and dispersion models. Journal Applied Meteor-

ology 43, 53–73.

Warner, S., Platt, N., Heagy, J.F., 2004b. Application of user-

oriented measure of effectiveness to transport and disper-

sion model predictions of the European tracer experiment.

Atmospheric Environment 38 (39), 6789–6801.

Warner, S., Platt, N., Heagy, J.F., 2004c. Comparisons of

transport and dispersion model predictions of the URBAN

2000 field experiment. Journal Applied Meteorology 43,

829–846.

Warner, S., Platt, N., Heagy, J. F., 2004d. Comparisons of

transport and dispersion model predictions of the European

tracer experiment: area-based and population-based mea-

sures of effectiveness, IDA Paper P-3915, 139pp. (Available

electronically [DTIC STINET ADA427807] or via a request

to Steve Warner at [email protected].)

mailto:[email protected]




Documents

Comparisons of transport and dispersion model predictions of the european tracer experiment: area- and population-based user-oriented measures of effectiveness