18
To be presented at 2001 Annual Conference of the International Association for Mathematical Geology, Cancun, Mexico, September, IAMG2001. Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental Systems Research Institute, 380 New York Street, Redlands, CA 92373-8100 Abstract Probability maps are used to define areas with high and low certainty of exceeding a threshold value. The most popular methods for creating these maps are variants of indicator kriging. However, such methods are questionable when the data exhibit a trend or contain measurement errors, the latter of which is common in most data sets. This paper presents an alternative approach that maps the risk of exceeding a threshold value after detrending and/or transforming the data. These methods are otherwise known as kriging on residuals and lognormal, Gaussian, and trans-Gaussian kriging. To illustrate this point an environmental case study is used to compare the predicted results for six different kriging method. All predictions and maps were made using the Geostatistical Analyst extension to ESRI ArcGIS software. Introduction This paper discusses the use of traditional kriging techniques when mapping variables from data that exhibit a trend and contain measurement errors. For most applications kriging is usually associated with exact interpolation, that is, the kriging predictions change smoothly in space until they get to a location where data have been collected, at which point there is a “jump” in the prediction to the exact value that was measured. This also results in a discontinuity in the prediction standard errors, which “jump” to zero at the measured locations. In practice, if multiple measurements of variable are made at the same location, more often than not the values will be different. This can be attributed to error in the measurement device or changes in the measurement conditions (a major source of uncertainty). For example, soil contamination data acquired after the Chernobyl accident was reported as a single value for each town in Belarus (Krivoruchko 1997). However, in reality, contamination is heterogeneous, varying spatially on the ground and with soil depth. For any given settlement this heterogeneity leads to differences in the measured caesium content of 15 to 30 percent. Futhermore, the situation is exacerbated by the inclusion of errors when locating the sample point. Gabrosek and Cressie (2001) have shown that such errors cause changes the matrix of covariances between sampled sites, and the vector of covariances between sampled sites and prediction location. These errors can be roughly estimated by adding an additional measurement error component to the nugget in the semivariogram/covariance model. Locational errors include: measurements are distributed throughout the settlement, but the coordinates are represented by the centroid of the settlement;

Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

To be presented at 2001 Annual Conference of the International Association for Mathematical Geology, Cancun, Mexico, September, IAMG2001.

Using linear and non-linear kriging interpolators to produce probability maps

Konstantin Krivoruchko

Environmental Systems Research Institute, 380 New York Street, Redlands, CA 92373-8100

Abstract Probability maps are used to define areas with high and low certainty of exceeding a threshold value. The most popular methods for creating these maps are variants of indicator kriging. However, such methods are questionable when the data exhibit a trend or contain measurement errors, the latter of which is common in most data sets. This paper presents an alternative approach that maps the risk of exceeding a threshold value after detrending and/or transforming the data. These methods are otherwise known as kriging on residuals and lognormal, Gaussian, and trans-Gaussian kriging. To illustrate this point an environmental case study is used to compare the predicted results for six different kriging method. All predictions and maps were made using the Geostatistical Analyst extension to ESRI ArcGIS software. Introduction This paper discusses the use of traditional kriging techniques when mapping variables from data that exhibit a trend and contain measurement errors. For most applications kriging is usually associated with exact interpolation, that is, the kriging predictions change smoothly in space until they get to a location where data have been collected, at which point there is a “jump” in the prediction to the exact value that was measured. This also results in a discontinuity in the prediction standard errors, which “jump” to zero at the measured locations. In practice, if multiple measurements of variable are made at the same location, more often than not the values will be different. This can be attributed to error in the measurement device or changes in the measurement conditions (a major source of uncertainty). For example, soil contamination data acquired after the Chernobyl accident was reported as a single value for each town in Belarus (Krivoruchko 1997). However, in reality, contamination is heterogeneous, varying spatially on the ground and with soil depth. For any given settlement this heterogeneity leads to differences in the measured caesium content of 15 to 30 percent. Futhermore, the situation is exacerbated by the inclusion of errors when locating the sample point. Gabrosek and Cressie (2001) have shown that such errors cause changes the matrix of covariances between sampled sites, and the vector of covariances between sampled sites and prediction location. These errors can be roughly estimated by adding an additional measurement error component to the nugget in the semivariogram/covariance model. Locational errors include:

• measurements are distributed throughout the settlement, but the coordinates are represented by the centroid of the settlement;

Page 2: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

• measurements are collected for a territory (e.g., fish abundance), the area of which is known only approximately and therefore the coordinates are chosen randomly within the area;

• the coordinates of sample points may be truncated (e.g., meteorological stations); • the coordinates of sample points vary from one map projection system to another.

Probability maps are used to define areas with high and low certainty of exceeding a threshold value. The most popular methods for creating such maps are variants of indicator kriging. However, these and other nonlinear kriging methods, such as probability and disjunctive kriging, can be criticized because they are exact interpolators and, consequently, they do not account for uncertainty in the prediction at the measurement locations that is represented by the so-called nugget effect. Additionally, variants of indicator kriging lose information about data variability especially if there is a trend in the data. There are, however, variations of linear kriging (simple, ordinary, and universal) that can produce noise free predictions in the form of a “new” value at each measurement location. For these methods, it is possible to specify the proportion of the nugget effect that is microscale variation and that which is measurement error (Johnston et. al. 2001), hence they do not cause discontinuities in either the predicted values or in the standard error of the predictions. An additional advantage when working with environmental data set (e.g., radioceasium soil contamination) is that linear kriging can be used with data transformation and detrending options. A comparison of the three prediction methods, namely, exact, filtered, and new value can be found in Krivoruchko et al. (2000). Geostatistical model with measurement errors. Bias refers to the systematic deviation of the predictor from population parameters, such as mean value. A predictor is said to be unbiased if its expected value is equal to the population parameter it predicts. Unbiasedness is an average property, that is, the mean of any single sample is usually not equal to the population mean, but the average of the means of samples from a population should be equal the population mean. Consider the model for processes with measurement error:

Z(s) = S(s) + ε(s) (1) where Z(s) is observational process at location s, S(s) is the process of interest (or signal), and ε(s) is white-noise, which includes the measurement error. Filtered kriging minimizes the mean-squared prediction error (MSPE) of the signal:

2)]ˆ([)ˆvar()ˆ( SSbiasSSSMSPE −+−=

where S is a predictor of S and:

)ˆ()ˆ( SSESSbias −≡−

Page 3: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

When mapping, the bias term does not allow the MSPE to be reduced when averaging for biased predictors, while the MSPE could be substantially reduced for unbiased predictors (Aldworth, 1998). Additionally, Aldworth (1998) proved that indicator kriging, indicator cokriging, and disjunctive kriging may be far from optimal and they are biased predictors of S(s0), which can result in an inaccurate map. This article compares linear and non-linear kriging performance when mapping the risk of exceeding a specified critical value. It begins with brief description of the model for ordinary kriging. More detailed information can be found in Krivoruchko et al. (2000) and in the appendix of the Geostatistical Analyst user manual (i.e., Johnston et. al., 2001). The model for ordinary kriging Assume the data are a realization of a spatially auto-correlated process plus independent random errors:

Zt(s) = µ(s) + Y(s) + η(s) + ε t(s)

where Zt(s) denotes the tth realization at location s, and let ni be the number of measurements at location si. Often ni = 1, and if ni > 1, it forms a measurement error model. We assume that µ(s) = µ is the unknown, deterministic mean value and Y(s) is a smooth second order stationary process, whose range of autocorrelation is detectable with an empirical semivariogram or covariance, i.e.:

E(Y(s)) = 0

Cov(Y(s), Y(s+h)) = Cy(h) η(s) is a smooth second order stationary process whose variogram range is so close to 0. In other words, the range is shorter than all practical distances between data and prediction locations:

E(η(s)) = 0

Cov(η(s), η (s+h)) = Cη(h) with Cη(∞) = 0 (model without nugget effect) ε t(s) is a white noise process composed of measurement errors:

E(ε t(s)) = 0, for all s and t

Cov(ε t(s), εu(s+h)) = σ2 if h = 0 and t = u, otherwise it is 0

Additionally, Y(•), η (•) and ε(•) are assumed to be independent of each other.

Page 4: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

The nugget effect d is composed of two parts: microscale variation plus measurement error, d = Cη(0) +σ2. From this model:

==≠≠

+++

=+. and

,or ifif

)()()()(

))(),(( 2 utut

CCCC

ZZCovy

yut 0h

0h00

hhhss

ση

η

From these assumptions, Krivoruchko et. al. (2000), derived different predictions and standard errors using exact, filtered, and “new value” techniques. In particular, the prediction standard errors for ordinary kriging are: • For exact prediction

)(ˆ 0sZσ = ')(C )(C y mcz −−+ ?00 η ;

• For filtered prediction of noiseless quantity

)(ˆ 0sSσ = ')1( )(Cy ms −−−+ c?0 δπ

in the case of S(s0) ≡ µ + Y(s0) + η(s0) at location s0; • For new value prediction

)(ˆ 0sZσ = ' )(Cy mz −−+ c?0 δ ,

where λ is a vector of the kriging weights, m is the Lagrange multiplier; p is a proportion of the nugget effect that is measurement error and microscale variation, 0 = p = 1, s 2 = pd and C?(0) = (1 – p)d. Repeat measurements per location allow measurement error to be estimated:

D

iiD

n

jj

ME nN

ZZi

i

−=

∑∑∈ =

2

12

))()((ˆ

sss

s

σ ,

where D is the set of all data locations that have more than one measurement, Zj(si) is the jth measurement at location si, )( iZ s is the mean value at location si, ni is the number of observations at location si ∈ D, N = Σi ni for all si in D, and nD is the number of spatial locations in D. Case study Probability maps can be invaluable when estimating the epidemiological consequences of environmental factors. The following case study discusses how various interpolation techniques may influence the production of probability maps of soil contamination caused by the Chernobyl accident.

Page 5: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

The Chernobyl accident (April 26, 1986) released about 1.85⋅1018 Becquerel of radioactive material. Each Becquerel represents one radioactive decay event per second. The wind intensity and direction on April 26-30 was such that most of the radioactive dust from Chernobyl Nuclear Power Plant (NPP) fell on Belarus. Since direct measurements of the dose accumulated by people living in contaminated areas are unavailable, epidemiologists primarily use point measurements of radiocaesium soil contamination to establish the correlation between the radioactive dose (assuming a linear relationship between contamination and dose) and the post-Chernobyl growth of pathologies. However, point measurements are not available for every town and those that are available contain errors. Thus, statistical spatial data analysis that account for uncertainties in the data should be used to predict contamination at the unsampled locations. Table 1 lists the threshold values of soil contamination and the appropriate actions to be taken. Zone’s name 137Cs, Ci/sq.km 90Sr, Ci/sq.km 238-240Pu, Ci/sq.km Zone with periodic control 1-5 0.15-0.5 0.01-0.02 Zone with rights for relocation 5-15 0.5-2 0.02-0.05 Relocation required in the future 15-40 2-3 0.05-0.1 People must be relocated >40 >3 >0.1 Table 1. Threshold values for soil contamination in Belarus and the corresponding governmental actions. Since the data contain errors, the use of point measurements for a particular town can lead to mistakes in the decision making process, especially when point measurements are close to a threshold value. For example, when two adjacent measurements that are separated by a short distance have radiocaesium values of 14 and 16 Ci/km2, only one would be designated as safe (refer to Table 1). From experience, it is known that the error in 137Cs soil measurements range from 15 to 30% and, therefore, these two measurements are approximately equal and hence it would be difficult to conclude that either location is safe or unsafe. To construct the environmental process, S (equation 1), filtered kriging is used to map the probability of exceeding some critical level of a contaminant. For comparison, indicator kriging, indicator cokriging, and disjunctive kriging are used to map the probability of the noisy version, Z, of the environmental process. The locations for radiocaesium soil contamination measurements that were acquired in the southern Belarus are shown in figure 1. The data were randomly split into training (387 samples) and validation data sets (166 samples). Training data were used to develop a model for prediction. Predicted values are then compared with the measurements in the validation data set.

Page 6: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

Figure 1. Map of indicators for threshold value of 15 Ci/sq.km: green symbols represent measurements with 137Cs contamination greater than 15 Ci/sq.km and pink symbols show locations with values less than 15 Ci/sq.km. Note that the validation samples are displayed as bars and the training samples are displayed as circles. As with most Chernobyl-related data sets, the 137Cs soil contamination data have a skewed distribution with a small number of samples with large values (Figure 2). Repetitive sampling techniques have shown these data to be reliable estimates of contamination.

Figure 2. Distribution of 137Cs soil contamination data acquired in Belarus. Note, the data are skewed toward lower values.

Page 7: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

Indicator kriging was applied to the training data using a threshold value of 15 Ci/sq.km to convert the measured values into a binary data set. That is, samples below the threshold were assigned a value of 0, whilst samples above the threshold were assigned a value of 1. Since the indicator variables have values that lie between 0 and 1, the output interpolations will also have values between 0 and 1. Hence, the predictions can be interpreted as the probability of the variable belonging to the class that is indicated by a value of 1. Therefore, the resulting map shows the probability of exceeding the threshold value (Figure 3). It should be noted that if the data set contain measurement errors, then it is not possible to define the relative proportion of the signal and error for the indicator transformation. Consequently, it is not possible to create a probability map for the signal process because when a measurement value is close to the threshold, the error associated with that value leads to difficulty in the transformation to a binary number (i.e., the true value could lie either side of the threshold). Figure 3 was produced using indicator kriging and indicator cokriging with additional cutoffs at 2, 7 and 25 Ci/sq.km as covariates. Sample locations with values greater than 10 Ci/sq.km in the validation data set are displayed. Additionally, locations with measurements greater than 25 Ci/sq.km, together with a probability of exceeding 15 Ci/sq.km that is less than 0.75, are displayed as large bars. In total, there are 32 locations that exceed the 25 Ci/sq.km value. Of these, seven are poorly predicted using indicator kriging, whilst five are poorly predicted using indicator cokriging. The pink contour in the figure 3b shows the area from which most of people where relocated after the disaster. It is evident, however, that the area of risk exceeds the restricted zone.

a) b) Figure 3. Probability that threshold 15 Ci/sq.km was exceeded using: (a) indicator kriging; and (b) indicator cokriging with additional cutoffs at 2, 7, and 25 Ci/sq.km. Improving the predictions using linear kriging techniques Linear kriging techniques, such as simple, ordinary, and universal kriging algorithms have the potential to improve the above maps. This is because such techniques can be combined with data transformation and detrending options. In contrast to indicator

Page 8: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

kriging, linear kriging techniques estimate the mean value rather than the probability of exceeding a threshold value. However, when the assumption of multivariate normality is satisfied, linear kriging algorithms can be used to create probability and quantiles maps. The decision on whether or not to use linear kriging methods for probability mapping can be aided using diagnostic tools that check for univariate and bivariate data normality. To normalize the 137Cs data, the logarithm transformation was used with ordinary kriging and universal kriging, whereas a normal score transformation was used with simple and disjunctive kriging. The resultant distribution after the application of the logarithm transform is shown in figure 4. In addition to the application of transforms, the data were also, when appropriate, detrended. Whilst detrending and transformation options cannot be used with indicator kriging and indicator cokriging, it should also be noted that with linear kriging their application is not always appropriate (Table 1). For example, simple kriging assumes that the mean value is known and hence the trend cannot be removed from the data. Additionally, the normal score transformation requires knowledge about the mean value of the process, hence it should not be used with ordinary kriging and universal kriging because the mean value is estimated rather than known a priori.

Kriging type Log Normal score transformation

Ordinary Before detrending Not used

Universal Before detrending Not used

Disjunctive Before detrending After detrending Table 1. Order of detrending and data transformation for ordinary, universal, and disjunctive kriging.

Figure 4. Histogram after logarithm transformation.

Page 9: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

Figures 5 and 6 compare the theoretical and estimated covariance for different indicators (i.e., 0.17, 0.33, 0.5, 0.67, and 0.83 quantiles) for normal score and logarithm transformation. If the data have a bivariate distribution then the yellow and green curves should be similar (see Deutsch and Journel, 1998; Johnson et. al., 2001). In our case, in comparison to the normal score transform, the logarithm transform leads to a distribution which is closer to the bivariate normal distribution.

b)

c)

d)

a)

e)

Figure 5. Examination of bivariate data distribution after normal score transformation. Indicator covariances for 0.17 (a), 0.33 (b), 0.5 (c), 0.67 (d), and 0.83 (e) quantiles.

Page 10: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

b)

c)

d)

a)

e)

Figure 6. Examination of bivariate data distribution after logarithm transformation. Indicator covariances for 0.17 (a), 0.33 (b), 0.5 (c), 0.67 (d), and 0.83 (e) quantiles. If prediction at the unknown locations is normally distributed then the mean and median predictions will be in the center of the probability density distribution for each location. If we want to predict the probability that the value is greater than a threshold value it will be the area under the distribution curve to the right of the threshold line. The prediction distribution changes for each location since predicted mean and standard error change. Thus, when holding the threshold value constant, a probability map is produced for the whole surface. A quantile map is produced when holding the probability constant. Based on visual information presented in the figures 5 and 6, we can assume that after transformation, the data are distributed approximately multivariate-normally and, therefore, we can use linear kriging interpolators to produce probability maps. Figures 7(a) and 7(b) show the probability of exceeding 15 Ci/sq.km, mapped using simple kriging (signal prediction) with normal score transformation and disjunctive kriging (prediction of noisy data) with normal score transformation and detrending with

Page 11: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

local polynomial interpolation, respectively. Of the 32 samples with large contamination levels, only 4 are poorly predicted using disjunctive kriging, whilst only 3 are poorly predicted using simple kriging. That is, locations where the actual measurements exceed 25 Ci/sq.km and the estimated probability of exceeding 15 Ci/sq.km is less than 0.75.

a) b) Figure 7. Probability that threshold 15 Ci/sq.km was exceeded using disjunctive (a) and simple kriging (b). Figure 8(a) shows the map of large-scale variation, after logarithm transformation, which was used to remove the trend in disjunctive, universal and ordinary kriging, whilst figure 8(b) shows the semivariogram of residuals used in the universal kriging.

a) b) Figure 8. Trend estimation after logarithm transformation (a) and semivariogram on residuals (b), which were used as parameters of the universal kriging model. Figure 9(a) and 9(b) show the probability of exceeding 15 Ci/sq.km, mapped using universal kriging and ordinary kriging after using the logarithm transformation and detrending options. The semivariogram used in lognormal universal kriging is shown in figure 8(b). Of the 32 samples with large contamination levels, only 3 are poorly predicted using both universal kriging and ordinary kriging.

Page 12: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

a) b) Figure 9. Probability that threshold 15 Ci/sq.km was exceeded using universal kriging (a) and ordinary kriging (b). A summary of prediction quality based on the validation criterion is presented in table 2. Note that predictors that assume a multivariate normal distribution perform better, which can be attributed to the use of transformation and detrending in the production of a reliable geostatistical model. Kriging method 15 Validation

data, Ci/sq.km Threshold, Ci/sq.km

Estimated probability

Number of poor predictions

Indicator Greater than 25 15 0.75 7 Indicator cokriging with three additional cutoffs

Greater than 25 15 0.75 5

Disjunctive , normal score transformation, detrending

Greater than 25 15 0.75 4

Simple, normal score transformation

Greater than 25 15 0.75 3

Universal, logarithm transformation

Greater than 25 15 0.75 3

Ordinary, logarithm transformation, detrending

Greater than 25 15 0.75 3

Table 2. Validation of the results of prediction of 15 measurements greater than 25 Ci/sq.km.

Page 13: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

Table 3 summarizes the results when the threshold value is set at 5 Ci/sq.km. As expected, poor results were obtained using both indicator kriging and indicator cokriging, whereas methods based on the assumption of, and correction for, data normality and stationarity performed better. Kriging method 64 Validation

data, Ci/sq.km

Threshold, Ci/sq.km

Estimated probability

Number of poor predictions

Indicator Greater than 7 5 0.75 12 Indicator cokriging with three additional cutoffs

Greater than 7 5 0.75 12

Disjunctive , normal score transformation, detrending

Greater than 7 5 0.75 5

Simple, normal score transformation

Greater than 7 5 0.75 7

Universal, logarithm transformation

Greater than 7 5 0.75 6

Ordinary, logarithm transformation, detrending

Greater than 7 5 0.75 6

Table 3. Validation of the results of prediction of 64 measurements greater than 7 Ci/sq.km. Predicting the value for a specified quantile for each location offers an alternative approach to probability mapping. Such maps can be interpreted as a series of over- and under-estimated values and can, therefore, be used in the decision making process. For example, it is better to evacuate additional people from a larger area than to underestimate the area contaminated by dangerous levels of radioactive contamination. Figure 10 shows the example of optimistic (first quartile), medium (median), and pessimistic (third quartile) mapping using lognormal ordinary kriging on residuals.

a) b) c) Figure 10. Quantile maps of 137Cs soil contamination in the southern part of Belarus. a) First quartile, b) median, and c) third quartile mapping using lognormal ordinary kriging on residuals. New value kriging New value and filtered kriging alters the measured values at the sample locations and therefore produces a smoother map (figure 11). This map was produced using the

Page 14: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

training data set and lognormal universal kriging. Locations where measurements are below and predictions are above 5 and 15 Ci/sq.km are represented by red circles. For reference, the difference between the measured and predicted values are also shown adjacent to each circle. In figure 11(a) thirteen sample locations, where measurements are below and predictions are above 5 Ci/sq.km, are located close to the most contaminated zone. Whilst universal kriging predicts values greater than 5 Ci/sq.km, the measurements suggest that these are safe areas. Figure 11(b) shows a similar map created using a threshold value of 15 Ci/sq.km. In this map seven samples have predictions greater than the threshold, whilst the measurements are below the critical value.

a) b) Figure 11. Prediction of the new value in the measurement (training) data locations by lognormal universal kriging. For each point displayed in figure 11, the probability of exceeding the threshold levels of 5 and 15 Ci/sq.km is summarized in tables 4 and 5.

Page 15: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

Measurement Value Predicted Value Difference Probability that Threshold 5

Ci/sq.km was exceeded 3.20 5.79 2.59 0.82 3.26 5.40 2.14 0.71 3.78 5.77 1.99 0.75 3.94 5.08 1.14 0.47 4.12 6.01 1.89 0.80 4.12 6.56 2.44 0.90 4.33 7.39 3.06 0.98 4.50 5.00 0.50 0.38 4.58 5.09 0.51 0.41 4.67 5.47 0.80 0.55 4.67 6.63 1.96 0.85 4.76 5.50 0.74 0.58 4.96 5.74 0.78 0.62

Table 4. New value prediction and probability that levels 5 Ci/sq.km was exceeded by lognormal universal kriging for points displayed in the figure 10a as red circles. Measurement Value Predicted Value Difference Probability that Threshold

15 Ci/sq.km was exceeded 11.01 15.88 4.87 0.59 12.52 18.67 6.15 0.82 12.80 16.33 3.53 0.57 13.82 16.88 3.06 0.63 13.82 18.52 4.70 0.76 14.17 15.97 1.80 0.49 14.86 15.94 1.08 0.47

Table 5. New value prediction and probability that levels 15 Ci/sq.km was exceeded by lognormal universal kriging for points displayed in the figure 10b as red circles. Important question arise in relation to the uncertainty of the predictions when creating probability maps. Figure 12(a) and 12(b) are standard error of probability maps created using indicator and disjunctive kriging, the latter of which is based on the assumption of bivariate normality and was created using the detrending and normal score transformation options.

Page 16: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

a) b) Figure 12. Standard error of indicators maps created using (a) indicator and (b) disjunctive kriging with the threshold value set at 15 Ci/sq.km. Green symbols show locations with 137Cs contamination values greater than 15 Ci/sq.km and pink symbols show locations with values less than 15 Ci/sq.km. For indicator kriging, the map clearly represents data density. In contrast, whilst the probability map created using disjunctive kriging is also data dependent, and the largest uncertainty corresponds to areas close to the critical threshold value. Conclusion This paper presents several methods for identifying areas in Belarus with dangerous contamination levels resulting from the Chernobyl incident and based on non-Gaussian data with measurement errors. For the case study outlined, the key aim was to map areas within high levels of radiocaesium soil contamination. Therefore, the diagnostic methods used concentrated on the quality of predictions close to the critical threshold values of 5 and 15 Curie per square kilometers. Results show that indicator kriging and indicator cokriging are sub-optimal predictors of extreme values because these methods lose information during data transformation and the indicator covariance does not describe the underlying process. In contrast, parametric kriging performed better, primarily because they can be used with data transformation and detrending options. Building a model is more than just drawing a surface through the set of points, and certain attention should be paid to data errors and model uncertainties, e.g. the uncertainties that occur when a non-stationary environmental process is being transformed to a stationary one. Parametric kriging techniques provide more flexibility for modeling environmental data. If options, such as data detrending, data declustering, data transformation, measurement errors specification, cross-validation and validation tools are used correctly, then one can expect better results. Among possible alternatives to traditional kriging are the geostatistical conditional simulation techniques. In general, kriging gives better pointwise predictions with data errors, whilst conditional simulations perform better when looking for predictions of

Page 17: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

transfer functions or joint probability of data locations being above or below specified threshold. However, selection of an appropriate geostatistical conditional simulation algorithm can be even more difficult than selecting an optimal kriging method. Thus, the use of sequential Gaussian simulations is only justified if there is a reason to believe that measurements are precise, the estimated covariance model is a true model of the underlying process, which is Gaussian, and cumulative/probability distribution of the data is the same globally and locally. However, in our case study none of these assumptions are valid. For example, figure 13 highlights the variability of local distributions for six sub-regions, each containing approximately 100 samples.

Figure 13 Histograms for six subregions of the radiocesium data in the southern part of Belarus. References • Aldworth, J. (1998), Spatial prediction, spatial sampling, and measurement errors.

Unpublished PhD thesis, Iowa State University. • Cressie, N. (1993), Statistics for spatial data, revised ed. John Wiley and Sons, New

York. 900 p. • Deutsch, C.V. and Journel, A.G. (1998), GSLIB: Geostatistical software library and

user’s guide. Oxford University Press, New York, 370 p. • Gabrosek, J., and Cressie, N. (2001), The effect on attribute prediction of locational

uncertainty in spatial data. Technical report 674. Ohio State University. • Johnston, K., J. Ver Hoef J., Krivoruchko, K. and Lucas, N. (2001) Using

Geostatistical Analyst. ESRI. 300 p. • Krivoruchko, K. (1997), Geostatistical Picturing of Chernobyl Fallout and

Estimation of Cancer Risk Among Belarus Population. Third Joint European Conference and Exhibition on Geographical Information, Vienna, Austria, pp. 676-685, April 1997.

Page 18: Using linear and non-linear interpolators to produce ...€¦ · Using linear and non-linear kriging interpolators to produce probability maps Konstantin Krivoruchko Environmental

• Krivoruchko K., (1998), GIS and Geostatistics: Spatial Analysis of Chernobyl’s Consequences in Belarus. Workshop on Status and Trend in Spatial Analysis. Santa Barbara, CA. December 10-12, 1998. <http://www.ncgia.ucsb.edu/conf/sa_workshop/papers/krivoruchko_old.html>

• Krivoruchko, K., Gribov, A., and Ver Hoef, J. (2000), Predicting Exact, Filtered, and New Values using Kriging, In, Yarus, J. and Chambers, R. (eds.) “Stochastic Modeling and Geostatistics”. 2. AAPG Computer Applications in Geology: In press. Also available at <http://www.esri.com/software/arcgis/arcgisextensions/geostatistical/>