45
A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

  • Upload
    emil

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002. World Cup Algorithm. - PowerPoint PPT Presentation

Citation preview

Page 1: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

A Latent Approach to the Statistical Analysis of Space-time Data

Dani GamermanInstituto de Matemática

Universidade Federal do Rio de JaneiroBrasil

http://acd.ufrj.br/~dani

17th International Workshop on Statistical ModellingChania, Crete, Greece, 8-12 July 2002

Page 2: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

1990

1974

Played in Europe

World Cup Algorithm

?2006

1958

Played in Europe

2002

1962

Played in exotic place

1970

1994

Played in (L) America

1986

1978

Played in L. America

1966

1998

One-time only home win Europe 25 miles apart

Center of theworld football

1982

Page 3: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

A Latent Approach to the Statistical Analysis of Space-time Data

Dani GamermanInstituto de Matemática

Universidade Federal do Rio de JaneiroBrasil

http://acd.ufrj.br/~dani

Joint work withMarina S. Paez (IM-UFRJ)Flavia Landim (IM-UFRJ)

Victor de Oliveira (Arkansas)Alan Gelfand (Connecticut)

Sudipto Banerjee (Minnesota)

17th International Workshop on Statistical ModellingChania, Crete, Greece, 8-12 July 2002

Page 4: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Introduction

Examples: 1) measurements of pollutants in time over a set of monitoring stations

3) counts of morbidity/mortality events in time over a collection of geographic regions

Environmental science – data in the formof a collection of time series that are geographically referenced.

Some examples can be found in other areas

2) selling price of properties around a neighborhood of interest

Page 5: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Main Objective: spatial interpolation

Example: Pollution in Rio de Janeiro

Paez, M.S. and Gamerman, D. (2001). Technical report. Statistical Laboratory, UFRJ.

Page 6: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Example: Pollution in Rio de JaneiroProb ( PM10 > 100 g/m3 | Yobs )

Other features of interest can be obtained

Picture showed mean interpolated values

Page 7: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Spatial Interpolation

m = number of observations

g = number of grid points

s1, ... ,sm = observed sites

s1n,...,sg

n = grid points (to interpolate)

Y1n,...,Yg

n = observations in the grid points

Page 8: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

dYpYYpYYp obsobsnobsn )|(),|()|(

- all model parametersYmis - missing data, treated as parameters

1. Frequentist inference: generate Yn from ),|( obsn YYp

• Obtain P(Yn|Yobs) by simulation.

Steps to generate from Yn|Yobs :

If with probability 1 then

),|()|( * obsnobsn YYpYYp

2. Bayesian inference: i ) generate from

ii ) generate Yn from

)|( obsYp

),|( obsn YYp

Interpolation )|( obsn YYp

Page 9: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Usual simplifications:

where = ( 1, ... , n ) with i=E[w(si)] and

= (ij [w(si), w(sj)] )i,j

Gaussian Process (GP)(or Gaussian Random Field)

S - region of Rp (in general, p=2)

{ w(s) : s S } is a GP ifn, s1 , ... , sm S

( w(s1) , ... , w(sn) ) ~ Nn (, )

2) Homoscedasticity i = , i

Notation: w(.) ~ GP((.),,)

1) Isotropy [w(si),w(sj)]= (hij) with hij=|si– sj|

Page 10: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Statistical Analysis

Starting point: regression models

Yt(s) = t(s) + e t(s) wheret(s) = 0 + 1 X t1(s) + ... + pXtp(s) andet(s) ~ N(0, e

2) independent

Suppose that Xtj(s) handles temporal autocorrelation Otherwise, we can include a temporal component t

Usually et(s) remains spatially correlated

In this case, et(s) = e0(s) + et1(s) e0(s) errors spatially correlated et1(s) pure residual (white noise) 0(s) = 0 + e0(s)

Page 11: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Inference 1. At first (3 steps)

• How to estimate 0(s) ?

Traditional approach: geostatistical 0(.) ~ GP(0,,) ore0(.) = 0(.) 0 ~ GP(0,,)

(b) e2, 2 and 0 estimated from rt0(s)

0ˆ and ˆ,ˆ e(c) Inference based on

(a) 0 , 1 , ... , p estimated in the regression model and the residuals rt0(s) = Yt (s) t(s) are constructed

then, 0obs ~ N(0 1, , R)

0obs = (0(s1) , ... , 0(sm) )

Hiperparameters: e2, 2 and 0

Page 12: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

3) Natural solution (Kitanidis, 1986; Handcock & Stein, 1993): • specify distribution for 0 • perform Bayesian inference

Problems: (a) rt0(s) et (s)(b) )ˆ,ˆ,ˆ( 0e ),,( 0e

2) next step:• 0 , 1 , ... , p and estimated jointly solves (a)• but to incorporate uncertainty about is complicated

0,, e

0ˆ,ˆ,ˆ e

Page 13: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Recall: E[Yt(s)]=0(s) + 1Xt1(s) + ... + pXtp(s)

Spatial heterogeneity doesn’t have to be restricted to 0

Model generalization

Example: site by site effect of temperature in the Rio pollution data

Page 14: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Extension of the previous modelE [Yt(s)] = 0(s) + 1(s)Xt1(s) + ... + p(s)Xtp(s)

previous model E [Yt(s)] = 0(s) + 1 Xt1(s) + ... + p Xtp(s)

Hyperparameters: = (ewhere = (0, 1,..., p)Special cases for the j(.)´s:

One possibility: (.) ~ GP(, , )

a) prior independence

),...,( 220 pdiag

(.))(.),...,( 0 p

b) same spatial structure and prior correlation between the j(.)´s

jj (.),(.)

We can accommodate spatial variation for other coefficients j, j=1, ... , p.

(.) = (0(.), 1(.),..., p(.))

Page 15: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

How to estimate j(s), j=0,1,...,p ?

2) natural solutions:Specify prior distribution for In general, independent and non informative priors are used

Problems (the same as before): (a) bj(s) j(s)

(b) ),,,( e)ˆ,ˆ,ˆ,ˆ( e

1) classical solution (Oehlert, 1993; Solna & Switzer, 1996):

(a) 0 (s), 1 (s), ... , p (s) estimated by

b0(s), b1(s), ... , bp (s) in the local regression model

(b) estimated from the bj(s)

(c) inference based on ˆ,ˆ,ˆ,ˆ e

,,, e

Page 16: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Model Summary

Parameters: obs , where = ( ,e2, ,)

jobs = (j(s1) , ... , j(sm) ), j=0, 1, ... , p

obs = (0obs , ... , p

obs )= ( 0 , 1 , ... , p )

Data: Yobs = (Y1(s1) , ... , YT(sm)) Xobs = (X1(s1) , ... , XT(sm))

Page 17: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Simulated data

Yt(s) = t(s) t(s), t=1,...,30 t(s) = 0(s)+ 1(s) Xt(s)

t(s) ~ N(0, e2) independent with e

2=1

0 ~ N(, ,(1 ~ N(, ,( Xt(s) ~ N(, ,(, for all time t

Exponential correlation functions: j(x)exp{- j x}

0= 100 1= 5 2= 00= 0.4 1= 0.8 2= 1.5 0

2= 0.1 1

2= 1 2

2= 0.333

Page 18: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

+

=

+

0

1X

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

-2.4

-2.4

-2.

4

-1.7 -1.7

-1.7 -1.7

-1.7

-1.7

-0.9

-0.9

-0.9

-0.9

-0.9

-0.9

-0.9

-0.

2

-0.2

-0.2

-0.2

-0.

2

-0.2

-0.2

-0.2

-0.2

-0.2

-0.2

-0.

2

-0.2

0.5

0.5

0.5

0.5

0.5

0.5

0.5 0.5

0.5

0.5 0.5

0.5

1.3

1.3

1.3

1.3

2.0

2.0

2.0

2.0

2.7

2.7

Y

Simulated Data

Page 19: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Inference

Parameters: (obs ,)= ( ,e

2, ,)

Likelihood:L(obs ,) = p(Yobs | obs , e

2 )

Prior:p(obs ,)= p( obs | ) p() p(e

2) p() p()

Posterior:(obs ,) L (obs ,) p(obs ,)

• Many parameters

• Complicated functional form

• Solution by MCMC

Page 20: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

again, use jobs as data (geostatistical analysis)

(c) [ e2 | rest ] ~ [ e

2 | Yobs , obs ] ~ Inverse Gamma

Full Conditionals

(a) [ obs | rest ] ~ Normal

(b) [ | rest] ~ Normal

(e) [ | rest ] ~ j p(j | jobs , j, ) p()

use jobs as if they were data

hard to sample Metropolis - Hastings

(d) [ |rest ] ~ [ | obs , , Inverse Wishart

Page 21: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Results(based on a regular grid of m=25 sites)

Histogram of the parameters

i = i

-2

Page 22: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Spatial Interpolation

Interpolation grid: s1n , ... , sg

n

jn = (j(s1

n) , ... , j(sgn) ), j=0, 1, ... , p

n = (0n , ... , p

n )

We need to obtain the interpolation of j´s to interpolate Yn

Page 23: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation of Y´s

(Yn,n,| Yobs) = (Yn|n, , Yobs) (n,| Yobs)

= (Yn| n ,) (n,| Yobs)

Simulation of [Yn |Yobs] also in 2 steps:

(a) [ n, | Yobs ] MCMC and Spatial Interpolation

(b) [ Yn| n ,] using Multivariate Normal

Spatial Interpolation of ´s (n,obs,| Yobs) = ( n | obs, , Yobs) ( obs, | Yobs) = ( n | obs ,) ( obs, | Yobs)

Simulation of [ n | Yobs ] in 2 steps:(a)[ obs, | Yobs ] using MCMC(b)[ n | obs ,] using Multivariate Normal

Page 24: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Simulated data: Interpolation of 1

Simulated values

Interpolated values

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

Page 25: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

Simulated data: Interpolation of Y30(.)

Simulated values

Interpolated values

0.1 0.3 0.5 0.7 0.9Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

Page 26: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation of X´s

These interpolations assume that the interpolated covariates Xj are available for j=1, ... , p

Otherwise, we must interpolate them

Page 27: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Simulation of [Xn|Yobs,Xobs] in 2 steps:

(a) [x | Xobs ] MCMC

(b) [Xn| x, Xobs ] using Multivariate Normal

Model may be completed with

X(.) | x , x , x ~ GP(x, x , x(.))

(Xn, x | Yobs , Xobs) = (Xn , x| Xobs ) = (Xn| x, Xobs) (x | Xobs )

Page 28: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Results obtained by interpolating X

Histogram of the parameters

Precisions less sparse then when X is known

Page 29: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation of X30(.)

0.1 0.3 0.5 0.7 0.9

lat

0.1

0.3

0.5

0.7

0.9

lon

g

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

Simulated values

Interpolated values

Page 30: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation of Y30(.)

Known X

Unknown X

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

0.1 0.3 0.5 0.7 0.9

Coordenada 1

0.1

0.3

0.5

0.7

0.9

Co

ord

en

ad

a 2

113.2

Page 31: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Application to the pollution data

t(s) independents N(0,2)

0~ N(, ,(.1~ N(, ,(.i(., i=0,1 are exponential correlation functions

Yt (s) = square root of PM10 at site s and time t

Xt = (MON, TUE, WED, THU, FRI, SAT)

Yt(s) = 0 (s) + 1(s)TEMPt + ´ Xt t(s)

• Now, the temperature coefficient varies in space

Page 32: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Results for the pollution data in Rio

Histogram of the hiperparameters sample

where i = i -2

Page 33: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation of the coefficient

Prior for G(10,10c)

SSDE = 0.0637

G(10-3,10-3)

SSDE = 0.1444

m

i

olsi

obsi YsE

m 1

2}ˆ]|)([{1

where SSDE =

Same idea can be used for (explanatory geostatistical analysis)

c obtained by exploratory analysis site by site (OLS)

Page 34: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Yt(s)= t(s) + et(s) where t(s)=t0(s)+t1(s)Xt1(s)+...+tp(s)Xtp(s) et(s) ~ N(0, e

2) independent

Extension of the previous model

Yt(s)= t(s) + et(s) where t(s)= 0(s)+ 1(s)Xt1(s)+...+ p(s)Xtp(s) et(s) ~ N(0, e

2) independent

previous model

Natural specification t(.) | t ~ GP(t , ,), independent in time

The model must be completed with:(a) prior for (e , , as before(b) specification of the temporal evolution of the t´s

We can also accommodate temporal variation of the coefficients j, j=0,...,p.

Page 35: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Suggestion - use dynamic models (SVP/TVM)(Landim & Gamerman, 2000)

t | t-1 ~ N( Gt t-1 , Wt )

unknown parameters of the evolution

Model parameters: obs , , = ( 0 , e

2 , , , W )

where = ( 1, ... , T) andt = ( t0 , t1, ... , tp ), t=1, ... , T

Simulation cycle has 2 changes:I) additional step to II) modified step to

Page 36: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Application to simulated data

Yt (s) = t0 (s) + t1(s)Xt1(s) + t(s)t(.) ~GP (t, ,)t = t-1 + t

same spatial correlation to 0 and 1

(. exponential correlation function with = 1.Histogram of the posterior of

Multivariate observations: Yt (s) = (Yt1 (s), Yt2 (s))

Page 37: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Trajectory of (t) - mean and credibility limits

Page 38: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Interpolation

Samples from ytn|yobs are obtained through the algorithm below:

1. Sample from tobs, yobs - through MCMC

2. Sample from tnt

obs- through Gaussian process

3. Sample from ytnt

n - Independent Normal draws

Once again, Xtn must be known, otherwise, they will have to be

interpolated.

Page 39: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Spatially- and time-varying parameters (STVP)

Not separable at the latent level, unlike SVP/TVM

Another possibility: temporal evolution applied directly to the t processes rather than to their means

Yt (s) = t0 (s) + t1(s)Xt1(s) + t(s)t(.) = t-1(.) + wt(.) wt(.) ~ GP (t, ,) independent in time

SVP/TVM:|)(|)(]},min{[)](),(cov[ 21212121 21

ssttIttRss tt

Completed with: 0(.) = 0 ~ N(g0,R)

STVP:|)](|}[,min{)](),(cov[ 212121 21

ssWttRss tt

Marginal Prior:t(.)| t ~ GP (t, t ,)

Page 40: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Computations

MCMC algorithm must explore the correlation structures parameters are visited in blocks

(Landim and Gamerman, 2000; Fruhwirth-Schnatter, 1994)

Based on the forecast distribution of YT+h|Yobs,for YT+h = (YT+h(s1

f ),..., YT+h(sFf )), and any collection (s1

f,..., sFf)

1. Sample from Tobs, Yobs - through MCMC

2. Sample from T+hfT

obs- obtained by introduction of T+hobs

3. Sample from YtnT

n - Independent normal draws

Tobs

T+hobs by successive evolution of the process

T+hobs T+h

f by interpolation with gaussian process

Prediction

Page 41: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Time-varying locations

Assume locations st = (st1,..., st

nt) at time t t

obs is a nt-dimensional vector, t = 1,...,T

1111~

),|~

,(),|( tobstt

obst

obst

obst dpp

1111~

),|~

(),~

|( tobsttt

obst dpp

Both densities in the integrand are multivariate normal

The convolution of these two densities can be shown to be normal and required evolution equation for can be obtained

SVP/TVM: Easily adapted STVP: requires introduction of for updated locations1

~t

Page 42: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Non-Gaussian Observations

Two distinct types of non-normality data:

• Count data:

• Continuous:

Can be normalized after suitable transformation g(y) Example: Rio pollution data ))(( YYg estimated jointly with other model parameters (de Oliveira, Kedem and Short, 1997)

For example, in the bernoulli or poisson form standard approach: yt(s) ~ EF(t(s)) spatio-temporal modeling issues: similar computations: harder

Page 43: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Non-Gaussian Evolution

Abrupt changes in the process normality is not suitable

Robust alternative:wt(.) ~ GP(t,,) is replaced bywt(.)| t ~ GP(t,t

-1, ) and t ~ G(t, t), independent for t=1,...,T

Therefore, wt(.) ~ tP(t,,)

t’s control the magnitude of the evolution

Page 44: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

Final Comments

• More flexibility to accommodate variations in time and space.

• Static coefficient models: samples from the posterior were generated in the software BUGS, with interpolation made in FORTRAN

• Extension to accommodate anisotropic processes to some components of the model.

• Extensions to observations in the exponential family and estimation of the normalizing transformation.

Page 45: A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman

A Latent Approach to the Statistical Analysis of Space-time Data

Dani GamermanInstituto de Matemática

Universidade Federal do Rio de JaneiroBrasil

http://acd.ufrj.br/~dani

17th International Workshop on Statistical ModellingChania, Crete, Greece, 8-12 July 2002