Spatial interpolation comparison



2 hour seminar within the Geostatistics training course at WUR

Citation preview

Spatial Interpolation ComparisonEvaluation of spatial prediction methods

Tomislav Hengl

ISRIC — World Soil Information, Wageningen University

Geostatistics course, 25–29 October 2010, Wageningen

Based on

Hengl, T., MacMillan, R.A., 2011? Mapping efficiency andinformation content. submitted to International Journal ofApplied Earth Observation and Geoinformation, special issue

Spatial Statistics Conference.

Geostatistics course, 25–29 October 2010, Wageningen


I Geostatistics = a toolbox to generate maps from point interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 25–29 October 2010, Wageningen


I Geostatistics = a toolbox to generate maps from point interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 25–29 October 2010, Wageningen


I Geostatistics = a toolbox to generate maps from point interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 25–29 October 2010, Wageningen


I Geostatistics = a toolbox to generate maps from point interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 25–29 October 2010, Wageningen

Have you heard of SIC?

Geostatistics course, 25–29 October 2010, Wageningen

The spatial prediction game

Participants were invited to estimate values located at 1000locations (right, crosses), using 200 observations (left, circles).

Geostatistics course, 25–29 October 2010, Wageningen

Lessons learned (from SIC)

Geostatistics course, 25–29 October 2010, Wageningen

Li and Heap (2008)

Geostatistics course, 25–29 October 2010, Wageningen

How many techniques are there?

Li and Heap (2008) list over 40 unique techniques.

1. Are all these equally valid?

2. How to objectively compare various methods (which criteriato use)?

3. Which method to pick for your own case study?

Geostatistics course, 25–29 October 2010, Wageningen

There are not as many

There are roughly five main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. bayesian methods;

5. expert systems / machine learning;

Geostatistics course, 25–29 October 2010, Wageningen

There are not as many

There are roughly five main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. bayesian methods;

5. expert systems / machine learning;

Geostatistics course, 25–29 October 2010, Wageningen

There are not as many

There are roughly five main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. bayesian methods;

5. expert systems / machine learning;

Geostatistics course, 25–29 October 2010, Wageningen

There are not as many

There are roughly five main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. bayesian methods;

5. expert systems / machine learning;

Geostatistics course, 25–29 October 2010, Wageningen

There are not as many

There are roughly five main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. bayesian methods;

5. expert systems / machine learning;

Geostatistics course, 25–29 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points — the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error — the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity — inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability — how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden — the time needed to completepredictions;

Geostatistics course, 25–29 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points — the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error — the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity — inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability — how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden — the time needed to completepredictions;

Geostatistics course, 25–29 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points — the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error — the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity — inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability — how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden — the time needed to completepredictions;

Geostatistics course, 25–29 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points — the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error — the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity — inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability — how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden — the time needed to completepredictions;

Geostatistics course, 25–29 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points — the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error — the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity — inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability — how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden — the time needed to completepredictions;

Geostatistics course, 25–29 October 2010, Wageningen

Can we simplify this?

1. In theory, we could derive a single composite measure thatwould then allow you to select ‘the optimal’ predictor for anygiven data set (but this is not trivial!)

2. But how to assign weights to different criteria?

3. In many cases we simply finish using some naıve predictor —that is predictor that we know has a statistically more optimalalternative, but this alternative is not feasible.

Geostatistics course, 25–29 October 2010, Wageningen

Automated mapping

In the intamap package1 decides which method to pick for you:

> meuse$value <- log(meuse$zinc)

> output <- interpolate(data=meuse, newdata=meuse.grid)

R 2009-11-11 17:09:14 interpolating 155 observations,

3103 prediction locations

[Time models loaded...]

[1] "estimated time for copula 133.479866956255"

Checking object ... OK


Geostatistics course, 25–29 October 2010, Wageningen


We need a single criteria to compare various prediction methods.

Geostatistics course, 25–29 October 2010, Wageningen

Mapping accuracy and survey costs

The cost of a soil survey is also a function of mapping scale,roughly:

log(X) = b0 + b1 · log(SN) (1)

We can fit a linear model to the empirical table data frome.g.Legros (2006; p.75), and hence we get:

X = exp (19.0825 − 1.6232 · log(SN)) (2)

where X is the minimum cost/ha in Euros (based on estimates in2002). To map 1 ha of soil at 1:100,000 scale, for example, oneneeds (at least) 1.5 Euros.

Geostatistics course, 25–29 October 2010, Wageningen

Survey costs and mapping scale

9.5 10.0 10.5 11.0 11.5 12.0 12.5




Scale number (log−scale)






s in


R /

ha (




Geostatistics course, 25–29 October 2010, Wageningen

Survey costs and mapping scale

Total costs of a soil survey can be estimated by using the size ofarea and number of samples.The effective scale number (SN) is:

SN =

√4 · A

N· 102 . . . SN =


N· 102 (3)

where A is the surface of the study area in m2 and N is the totalnumber of observations.

Geostatistics course, 25–29 October 2010, Wageningen

Converges to:

X = exp

(19.0825 − 1.6232 · log

[0.0791 ·


N· 102


Geostatistics course, 25–29 October 2010, Wageningen

Output map, from info perspective

The resulting (predictions) map is a sum of two signals:

Z ∗(s) = Z (s) + ε(s) (5)

where Z (s) is the true variation, and ε(s) is the error component.The error component consists, in fact, of two parts: (1) theunexplained part of soil variation, and (2) the noise (measurementerror). The unexplained part of soil variation is the variation wesomehow failed to explain because we are not using all relevantcovariates and/or due to the limited sampling intensity.

Geostatistics course, 25–29 October 2010, Wageningen

Prediction accuracy

In order to see how much of the global variation budget has beenexplained by the model we can use:

RMSE r (%) =RMSE

sz· 100 (6)

where sz is the sampled variation of the target variable.RMSE r (%) is a global estimate of the map accuracy, valid onlyunder the assumption that the validation points are spatiallyindependent from the calibration points, representative and largeenough (�100).

Geostatistics course, 25–29 October 2010, Wageningen

Kriging efficiency

Geostatistics course, 25–29 October 2010, Wageningen

Mapping efficiency

We propose two new measures of mapping success: (1) Mappingefficiency, defined as the amount of money needed to map an areaof standard size and explain each one percent of variation in thetarget variable:

θ =X

A · RMSE r[EUR · km−2 · %−1] (7)

where X is the total costs of a survey, A is the size of area inkm−2, and RMSE r is the amount of variation explained by thespatial prediction model.

Geostatistics course, 25–29 October 2010, Wageningen

Information production efficiency

(2) Equivalent measure of mapping efficiency is the informationproduction efficiency:

Υ =X

gzip[EUR · B−1] (8)

where gzip is the size of data (in Bytes) left after compression andafter reformatting the values to match the effective precision(based on Eq.10). This can be estimated as:

gzip = fc · (fE ·M ) · cZ [B] (9)

where fc is the loss-less data compression factor that depends onthe compression algorithm, fE is the extrapolation adjustmentfactor, cZ is the variable coding size, and M is the total number ofpixels.

Geostatistics course, 25–29 October 2010, Wageningen

Effective precision

Following the Nyquist frequency concept from signal processing,which states that the original signal can be reconstructed ifsampling frequency is twice the maximum component frequency ofthe signal, we can derive the effective precision — also known asnumerical resolution — of a produced prediction map as:

∆z =RMSE


which means that there is no justification in saving the predictionswith better precision than half the average accuracy.

Geostatistics course, 25–29 October 2010, Wageningen

Nyquist frequency concept



●● ●●


● ●●●●

● ●●

●● ●


● ●●

● ● ●●● ●●


Figure: The Nyquist rate is the optimal rate that can be used tocompress a signal (it equals twice the maximum component frequency ofthe signal) to allow perfect reconstruction of the signal from the samples.

Geostatistics course, 25–29 October 2010, Wageningen

Rounding numbers

Original data

2.25 4.08 6.25 4.23 2.56 1.21 0.98 0.98 0.85 0.4

4.24 4.69 7.17 4.37 2.08 1.4 1.44 0.96 0.89 0.31

3.62 5.39 5.27 3.11 2.04 1.57 1.67 1.43 0.61 0.28

2.72 8.75 7.77 4.63 2.88 2.34 2.93 1.49 0.57 0.25

2.83 10.55 14.45 5.79 3.13 2.95 2.85 0.89 0.34 0.22

2.87 5.45 10.34 5.01 2.42 1.88 1.5 0.61 0.3 0.23

1.19 2.69 3.76 3.63 1.86 0.97 1.24 0.64 0.37 0.26

0.86 1.22 1.39 2.71 2.17 1.61 2.37 1.56 0.66 0.47

0.67 1 1.23 1.53 2.04 3.12 5.74 3.71 1.53 0.92

1.18 1.48 1.35 2.13 2.11 3.64 7.56 6.92 2.97 1.96

Coded data

2 4 6 4 3 1 1 0

5 7 4 2 1 1 0

4 5 5 2 2 2 1 1

3 9 8 5 3 2 3 1 1

3 11 14 6 3 3 3

3 5 10 5 2 2 1

1 3 4 4 2 1 0

1 3 2 2 2

1 1 2 2 3 6 4 2 1

1 1 2 2 4 8 7 3 2

Geostatistics course, 25–29 October 2010, Wageningen


To follow this exercise, obtain the DSM_examples.R script.Download it to your machine and then run step-by-step.

Geostatistics course, 25–29 October 2010, Wageningen

Meuse data> data(meuse)

> coordinates(meuse) <- ~x+y

> proj4string(meuse) <- CRS("+init=epsg:28992")

> sel <- !$om)

> bubble(meuse[sel,], "om")



●●● ●●●






















●● ●●



● ●●












● ●●


● ●





Geostatistics course, 25–29 October 2010, Wageningen


++ +










+ +++




+++ +























+ ++++









+ ++


































+ +









+ ++









+ +++




+++ +























+ ++++









+ ++


































+ +

















Geostatistics course, 25–29 October 2010, Wageningen

Ebergotzen (subset)











+ +










++ +

+ +
































































































































+ ++



























































































































+ +










++ +

+ +
































































































































+ ++



























































































































Geostatistics course, 25–29 October 2010, Wageningen

Ebergotzen (subset)






































































++ ++





























































































































































































































































































































































































+ +

































































































































































































































































































































++ +































































































































++ ++





























































































































































































































































































































































































+ +

































































































































































































































































































































++ +




































































Geostatistics course, 25–29 October 2010, Wageningen

Ebergotzen (complete)




+ +






+ +




















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































+ +





+ +













+ +



+ +







+ +






+ +




















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































+ +





+ +













+ +



+ +














Geostatistics course, 25–29 October 2010, Wageningen

OK vs RK

2.0 2.5 3.0 3.5







Sampling intensity (log)



of v






Ordinary kriging Regression−kriging

Geostatistics course, 25–29 October 2010, Wageningen

Prediction accuracy and survey costs

Geostatistics course, 25–29 October 2010, Wageningen

Summary results

I For the two case studies there is a gain of 7% for mappingorganic matter (Meuse), and 13% and for mapping sandcontent (Ebergotzen) using regression-kriging vs ordinarykriging.

I to map organic carbon for the Meuse case study, one wouldneed to spend 13.1 EUR km−2 %−1 (1.13 EUR B−1); to mapsand content for the Ebergotzen case study would costs11.1 EUR km−2 %−1 (5.88 EUR B−1).

I Information production efficiency is possibly a more robustmeasure of mapping quality than mapping efficiency becauseit is scale-independent and because it accounts forextrapolation effects.

Geostatistics course, 25–29 October 2010, Wageningen

Summary results

I For the two case studies there is a gain of 7% for mappingorganic matter (Meuse), and 13% and for mapping sandcontent (Ebergotzen) using regression-kriging vs ordinarykriging.

I to map organic carbon for the Meuse case study, one wouldneed to spend 13.1 EUR km−2 %−1 (1.13 EUR B−1); to mapsand content for the Ebergotzen case study would costs11.1 EUR km−2 %−1 (5.88 EUR B−1).

I Information production efficiency is possibly a more robustmeasure of mapping quality than mapping efficiency becauseit is scale-independent and because it accounts forextrapolation effects.

Geostatistics course, 25–29 October 2010, Wageningen

Summary results

I For the two case studies there is a gain of 7% for mappingorganic matter (Meuse), and 13% and for mapping sandcontent (Ebergotzen) using regression-kriging vs ordinarykriging.

I to map organic carbon for the Meuse case study, one wouldneed to spend 13.1 EUR km−2 %−1 (1.13 EUR B−1); to mapsand content for the Ebergotzen case study would costs11.1 EUR km−2 %−1 (5.88 EUR B−1).

I Information production efficiency is possibly a more robustmeasure of mapping quality than mapping efficiency becauseit is scale-independent and because it accounts forextrapolation effects.

Geostatistics course, 25–29 October 2010, Wageningen


I Mapping efficiency (cost / area / percent of varianceexplained) is a possible universal criteria to compare predictionmethods.

I Maps are not what they seem.

I Geostatistics really outperforms non-statistical methods (butthis is area/data dependent).

I It’s not about the making beautiful maps, it’s aboutunderstanding what they mean.

I If you deal with several equally valid (independent) methods,maybe you should consider combining them?

Geostatistics course, 25–29 October 2010, Wageningen


I Mapping efficiency (cost / area / percent of varianceexplained) is a possible universal criteria to compare predictionmethods.

I Maps are not what they seem.

I Geostatistics really outperforms non-statistical methods (butthis is area/data dependent).

I It’s not about the making beautiful maps, it’s aboutunderstanding what they mean.

I If you deal with several equally valid (independent) methods,maybe you should consider combining them?

Geostatistics course, 25–29 October 2010, Wageningen


I Mapping efficiency (cost / area / percent of varianceexplained) is a possible universal criteria to compare predictionmethods.

I Maps are not what they seem.

I Geostatistics really outperforms non-statistical methods (butthis is area/data dependent).

I It’s not about the making beautiful maps, it’s aboutunderstanding what they mean.

I If you deal with several equally valid (independent) methods,maybe you should consider combining them?

Geostatistics course, 25–29 October 2010, Wageningen


I Mapping efficiency (cost / area / percent of varianceexplained) is a possible universal criteria to compare predictionmethods.

I Maps are not what they seem.

I Geostatistics really outperforms non-statistical methods (butthis is area/data dependent).

I It’s not about the making beautiful maps, it’s aboutunderstanding what they mean.

I If you deal with several equally valid (independent) methods,maybe you should consider combining them?

Geostatistics course, 25–29 October 2010, Wageningen


I Mapping efficiency (cost / area / percent of varianceexplained) is a possible universal criteria to compare predictionmethods.

I Maps are not what they seem.

I Geostatistics really outperforms non-statistical methods (butthis is area/data dependent).

I It’s not about the making beautiful maps, it’s aboutunderstanding what they mean.

I If you deal with several equally valid (independent) methods,maybe you should consider combining them?

Geostatistics course, 25–29 October 2010, Wageningen

Comparing methods

Geostatistics course, 25–29 October 2010, Wageningen


Dubois, G. (Ed.), 2005. Automatic mapping algorithms for routineand emergency monitoring data. Report on the Spatial InterpolationComparison (SIC2004) exercise. EUR 21595 EN. Office for OfficialPublications of the European Communities, Luxembourg, p. 150.

Hengl, T., 2009. A Practical Guide to Geostatistical Mapping, 2ndedition. University of Amsterdam, 291 p. ISBN 978-90-9024981-0.

Li, J., Heap, A., 2008. A review of spatial interpolation methods forenvironmental scientists. Record 2008/23. Geoscience Australia,Canberra, p. 137.

Pebesma, E., Cornford, D., Dubois, D., Heuvelink, G.B.M.,Hristopoulos, D., Pilz, J., Stohlker, U., Morin, G., Skoien, J.O.,2010. INTAMAP: The design and implementation of aninteroperable automated interpolation web service. Computers &Geosciences, In Press, Corrected Proof.

Geostatistics course, 25–29 October 2010, Wageningen
