24
An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University September 2009

An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Embed Size (px)

Citation preview

Page 1: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

An Introduction to Model-basedGeostatistics

Peter J Diggle

School of Health and Medicine, Lancaster University

and

Department of Biostatistics, Johns Hopkins University

September 2009

Page 2: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Outline

• What is geostatistics?

• What is model-based geostatistics?

• Two examples

– constructing an elevation surface from sparse data

– tropical disease prevalence mapping

Page 3: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Example: surface elevation data

1

2

3

4

56

X

0

1

2

3

4

5

6

Y

6570

7580

8590

9510

0Z

Page 4: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Geostatistics

• traditionally, a self-contained methodology for spatialprediction, developed at Ecole des Mines,Fontainebleau, France

• nowadays, that part of spatial statistics which isconcerned with data obtained by spatially discretesampling of a spatially continuous process

Kriging: find the linear combination of the data that bestpredicts the value of the surface at an arbitrary location x

Page 5: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Model-based Geostatistics

• the application of general principles of statisticalmodelling and inference to geostatistical problems

– formulate a statistical model for the data

– fit the model using likelihood-based methods

– use the fitted model to make predictions

Kriging: minimum mean square error prediction underGaussian modelling assumptions

Page 6: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Gaussian geostatistics (simplest case)

Model

• Stationary Gaussian process S(x) : x ∈ IR2

E[S(x)] = µ Cov{S(x), S(x′)} = σ2ρ(‖x − x′‖)

• Mutually independent Yi|S(·) ∼ N(S(x), τ2)

Point predictor: S(x) = E[S(x)|Y ]

• linear in Y = (Y1, ..., Yn);

• interpolates Y if τ2 = 0

• called simple kriging in classical geostatistics

Page 7: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Predictive distribution

• choose the target for prediction, F(S),where S = {S(x) : x ∈ A}

• draw samples Si : i = 1, ..., N from [S|Y ]

• then Fi = F(Si) : i = 1, ..., N is a sample from requiredpredictive distribution [F(S)|Y ]

Page 8: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Interpolating the elevation surface

Under Gaussian modelling assumptions, we need to:

• identify a parametric family of correlation functions

• fit the model

• use the model for prediction

Page 9: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

• identify a parametric family of correlation functions

The empirical variogram

(xi, Yi) : i = 1, ..., n uij = ||xi − xj || vij =1

2(yi − yj)

2

The theoretical variogram

V (u) =1

2Var{Y (x) − Y (x − u)} = τ2 + σ2{1 − ρ(u)}

Exploratory analysis

E[vij] = V (uij) ⇒ smoothed scatterplot of (uij, vij)identifies rough shape of ρ(u) and initial estimates ofmodel parameters

Page 10: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

geoR code:

library(geoR)

data(elevation)

summary(elevation)

vario<-variog(elevation,uvec=0.2*(0:25))

plot(vario)

?variog

vario2<-variog(elevation,uvec=0.2*(0:25),trend="1st")

plot(vario2)

plot(vario$u,vario$v,type="l",xlim=c(0,5),ylim=c(0,7000),

xlab="u",ylab="V(u)")

lines(vario2$u,vario2$v,col="red")

Page 11: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

• identify a parametric family of correlation functions

• fit the model

1. Classical: compute maximum likelihood estimates θ

2. Bayesian: prior [θ] implies posterior [θ|Y ]

geoR code for option 1:

mlfit<-likfit(elevation,ini.cov.pars=c(5000,2.0),

cov.model="matern",kappa=1)

Page 12: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

• identify a parametric family of correlation functions

• fit the model

• use the model for prediction

1. Plug-in:[S|Y ; θ]

2. Bayesian:[S|Y ] =∫[S|Y ; θ][θ|Y ]dθ

geoR code for option 1:

region<-matrix(c(0,0,6.4,0,6.4,6.4,0,6.4),4,2,T)

grid<-pred_grid(region,by=0.2)

KC<-krige.control(obj.model=mlfit)

OC<-output.control(n.predictive=100)

set.seed(24367)

predictions<-krige.conv(geodata=elevation,locations=grid,

borders=region,krige=KC,output=OC)

image(predictions)

points(elevation,add=T)

Page 13: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster
Page 14: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster
Page 15: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Tropical disease prevalence mapping

• “river blindness” – an endemic disease in wet tropics

• donation programme of mass treatment with ivermectin

• approximately 50 million people treated to date(target is 80 million by 2015)

• serious adverse reactions experienced by some patientshighly co-infected with Loa loa parasites

• precautionary measures put in place before masstreatment in areas of high Loa loa prevalence

http://www.who.int/pbd/blindness/onchocerciasis/en/

Diggle et al, Annals of Tropical Medicine and Parasitology,101, 499–509.

Page 16: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

The Loa loa prediction problem

Ground-truth survey data

• random sample of subjects in each of a number of villages

• blood-samples test positive/negative for Loa loa

Environmental data (satellite images)

• measured on regular grid to cover region of interest

• elevation, green-ness of vegetation

Objectives

• predict local prevalence throughout study-region (Cameroon)

• compute local exceedance probabilities,

P(prevalence > 0.2|data)

Page 17: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Loa loa: a generalised linear model

• Latent spatial process

S(x) ∼ SGP{0, σ2, ρ(u)}

ρ(u) = exp(−|u|/φ)

• Linear predictor

d(x) = environmental variables at location x

η(x) = d(x)′β + S(x)

p(x) = exp{η(x)}/[1 + exp{η(x)}]

• Conditional distribution for positive proportion Yi/ni

Yi|S(·) ∼ Bin{ni, p(xi)}

Page 18: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

The modelling strategy

• use relationship between environmental variables andground-truth prevalence to construct preliminarypredictions via logistic regression

• use local deviations from regression model to estimatesmooth residual spatial variation

• use fitted model for predictive inference

Page 19: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

logit prevalence vs elevation

0 500 1000 1500

−5

−4

−3

−2

−1

0

elevation

logi

t pre

vale

nce

Page 20: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

logit prevalence vs max NDVI

0.65 0.70 0.75 0.80 0.85 0.90

−5

−4

−3

−2

−1

0

Max Greeness

logi

t pre

vale

nce

Page 21: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Comparing non-spatial and spatial predictionsin Cameroon

Non-spatial

Predicted prevalence - 'without ground truth data'

3020100

Obse

rved p

reva

lence

(%

)60

50

40

30

20

10

0

Page 22: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Spatial

Predicted prevalence - 'with ground truth data' (%)

403020100

Obs

erve

d pr

eval

ence

(%

)

60

50

40

30

20

10

0

Page 23: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Probabilistic prediction in Cameroon

Page 24: An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster

Take-home message

• model-based approach:

– makes assumptions explicit

– makes choice of analysis strategy less subjective

– emphasises uncertainty

• exceedance probabilty maps are often more useful thanpoint predictions and standard errors

• text-book linked to geoR software

Diggle, P.J. and Ribeiro, P.J. (2007). Model-based Geostatistics.New York : Springer.