xu_wikle_070705

7/31/2019 xu_wikle_070705

1/33

Estimation of Parameterized Spatio-Temporal Dynamic Models

Ke Xu and Christopher K. Wikle

Department of Statistics, University of Missouri-Columbia

July 7, 2005: Original Submission July 21, 2004

Abstract

Spatio-temporal processes are often high-dimensional, exhibiting complicated variability across

space and time. Traditional state-space model approaches to such processes in the presence of uncer-

tain data have been shown to be useful. However, estimation of state-space models in this context is of-

ten problematic since parameter vectors and matrices are of high dimension and can have complicated

dependence structures. We propose a spatio-temporal dynamic model formulation with parameter

matrices restricted based on prior scientific knowledge and/or common spatial models. Estimation is

carried out via the expectation-maximization (EM) algorithm or general EM algorithm. Several param-

eterization strategies are proposed and analytical or computational closed form EM update equations

are derived for each. We apply the methodology to a model based on an advection-diffusion partial

differential equation in a simulation study and also to a dimension-reduced model for a Palmer Drought

Severity Index (PDSI) data set.

Keywords: Dynamic, EM algorithm, General EM, state-space, time series, spatial, spatio-temporal

Corresponding Author: Christopher K. Wikle, Department of Statistics, University of Missouri, 146 Math Science Building,

Columbia, MO 65211; 573-882-9659; fax: 573-884-5524; [email protected]

1

7/31/2019 xu_wikle_070705

2/33

1 Introduction

Spatio-temporal statistical models are essential tools for performing inference and prediction for processes

in the physical, environmental, and biological sciences. Such processes are often complicated in that the

dependence structure across space and time is non-trivial, often non-separable and non-stationary in space

or time. In addition, it is often the case that the number of spatial locations at which inference is desired

is quite large. Furthermore, data are often collected with substantial observational uncertainty and it is not

uncommon to have missing observations at various spatial and temporal locations.

Various approaches have been proposed to model spatio-temporal processes (e.g., see Kyriakidis and

Journel, 1999 for a review). If one considers time as an extra dimension, then traditional spatial statistics

techniques can be applied (Cressie, 1993). However, such approaches ignore the fundamental differences

between space and time, principally that time is naturally ordered and space is not. Alternatively, one can

consider the spatio-temporal problem from a multivariate geostatistical perspective which requires space-time covariance functions be specified. Traditionally this approach has been limited in that the known

class of valid spatio-temporal covariance functions is quite small, although in recent years, several authors

have extended this class of functions (e.g., Cressie and Huang, 1999; Gneiting, 2002). Nevertheless, this

approach is still limited by the fact that such covariance functions are often not realistic for complicated

dynamical processes and dimensionality can prohibit practical implementation.

Spatio-temporal processes can also be considered from the multiple time series perspective (e.g., see

Kyriakidis and Journel, 1999 for a review). That is, each spatial location is associated with a time series.

Then, multivariate time series techniques can be transferred to the space-time problem. However, such

approaches ignore the fundamental differences between space and time and ones ability to predict at

locations for which data were not observed is limited. Such approaches do not in general explicitly account

for uncertainty in the observed data. Perhaps more critically, such methods are difficult to implement in

cases where the dimensionality of the state vector (i.e., the number of spatial locations) is high.

A natural approach to spatio-temporal modeling for complex dynamical processes is a combination

of spatial and time series techniques, which is accomplished by a spatio-temporal dynamic model formu-

lation (e.g., see Cressie and Wikle, 2002 for a brief review). However, estimation in this context can beproblematic due to the high dimensionality of the state process. Several modeling strategies have been pro-

posed to address this problem. One approach is to reduce dimensionality by projecting the state-process on

some set of spectral basis functions (e.g., Mardia, Goodall, Redfern and Alonso, 1998; Wikle and Cressie,

2

7/31/2019 xu_wikle_070705

3/33

1999). Alternatively, one might specify very simple, random walk dynamics (e.g., Stroud, Mueller and

Sanso, 2001; Huerta, Sanso and Stroud, 2004). Another approach is to incorporate physical or biological

models directly into the parametrization (Wikle, Milliff, Nychka, and Berliner, 2001; Wikle, 2003). Even

in the case of physically or biologically motivated dynamic models, it is seldom the case for statistical

problems (unlike some engineering problems) that we know explicitly the model parameters. These mustbe estimated, but in the presence of known constraints on the dynamical formulation. Furthermore, due

to the high dimensionality, measurement error and process covariance matrices typically have too many

parameters to estimate outright. Thus, these matrices must be parameterized as well.

Estimation in the spatio-temporal dynamical model setting is best accomplished through a state-space

framework. Given parameters, the unobserved state-process can be estimated via the Kalman filter or

Kalman smoother (e.g., see Cressie and Wikle, 2002 for a review). However, in the more usual setting

where model parameters are unknown, the standard approach following Shumway and Stoffer (1982) is to

use the expectation-maximization (EM) algorithm to estimate parameters. As mentioned above, the spatio-

temporal problem typically requires restrictions on the parameter matrices. Shumway and Stoffer (1982)

discuss modifications to their algorithm to accommodate fully-restricted parameter matrices. However, it

is not clear how they account for partially restricted or parameterized model matrices in this framework.

Examination of Shumway (1988) (pp. 323-332) implies that one approach to deal with partially restricted

parameter matrices is to set initial parameters (in the EM algorithm) to agree with the known values. Then

in the M-step, only those parameters that require estimation are updated, so that the fixed parameters

dont change. Alternatively, one can update all parameters but then immediately impute the known values

for the fixed parameters. Although these approaches are relatively easy to implement, it is not clear that

they give the maximum likelihood estimates under the state-space model assumptions. Another approach,

considered here, is to develop general EM (GEM) algorithms to account directly for the restricted or

partially restricted model matrices.

In this paper we describe efficient estimation approaches for spatio-temporal dynamic models in which

the parameter matrices and/or noise covariance matrices are highly parameterized (or restricted). We

utilize GEM algorithms to carry out this estimation. In Section 2, we give some necessary background for

spatio-temporal dynamic models and GEM algorithms. In Section 3, 4 and 5, we propose several methods

of parameterization and derive the Expectation-Maximization (EM) update formula for each. In Section 6,

we consider two examples. Finally, Section 7 contains a brief summary and conclusion.

3

7/31/2019 xu_wikle_070705

4/33

2 Background

2.1 Spatio-Temporal Dynamic Model Formulation

Let be an vector containing the data values at spatial locations,

, at time . Let be an vector for an unobservable spatio-temporal state

process at some fixed network of locations at time . This state process is our primary interest.

The two sets of spatial locations, , where is some domain in , need not be the same. Write

(1a)

(1b)

for , where (1a) is called the measurement equation and (1b) the state equation. Let be a

known matrix that maps the data to the process . The measurement noise is zero-mean,

uncorrelated in time and Gaussian with covariance matrix . The dynamics are described by the

state equation (1b) via a first-order Markov process with the transition or propagator matrix . We also

assume there are shocks to the system, which are spatially colored, temporally white and Gaussian with

mean zero and a common covariance matrix . For completeness, we assume the process starts

with , which is a Gaussian spatial process with mean and covariance matrix . Such a model

in the spatio-temporal context is not new, nor is it the most general. However, such models have received

a considerable amount of attention in the environmental literature in recent years and have been shown to

be quite effective (e.g. see the review in Cressie and Wikle, 2002).

The parameters for the model (1) are The major challenge in fitting this

model lies in the high dimensionality of most space-time applications. For example, a model for Pacific

Sea Surface Temperature (SST) might have (Berliner, Wikle and Cressie, 2000), which

requires estimation of a matrix . Often, researchers resort to Bayesian hierarchical ap-

proaches for dealing with this dimensionality problem by considering restrictions to , assigning priors

and then using MCMC (Wikle, Berliner and Cressie, 1998). This paper shows that it is often possible to fit

such models and estimate their parameters via the convenient Kalman and EM algorithms. Again, the key

is to assign a structure to the model parameters. Though still limited for many problems, such a formula-

tion is useful in many settings. For example, in the early stage of the model building, one might consider

such an implementation, since it is fast and easy to implement when compared to the MCMC. In addition,

there are situations where there is little scientific theory or previous empirical evidence to suggest prior

4

7/31/2019 xu_wikle_070705

5/33

parameterizations for a fully-Bayesian model. In these cases, if the model is sufficiently parameterizaed,

the KF/EM approach is a reasonable alternative.

2.2 Kalman Filter and Smoother

Suppose we know the value of all parameters, , then one can use a set of recursions known as the

Kalman Filter and Kalman Smoother to obtain the conditional mean and covariance of the state variable,

(Kalman, 1960; Shumway and Stoffer, 1982). These recursions are well-known but we present them

here to define notation and for completeness. Our overview follows Shumway and Stoffer (2000) with

various notational modifications. First, define the conditional mean In particular,

and are called the predicted, filtered and smoothed values, respectively. Also define the

conditional variance covariance matrix, var and lag-one covariance matrix,

cov

To get predicted and filtered values, one evaluates the following set of recursions for ,

which is called the Kalman Filter:

and where and are specified. To get smoothed values, one runs the following backward recursion

for , which is sometimes called the Kalman Smoother:

To get the smoothed lag-one covariance, one runs the backward recursion for on

where

5

7/31/2019 xu_wikle_070705

6/33

2.3 EM Estimation

One can estimate the parameters by the method of moments and then plug them in (1) to implement the

Kalman filter (Wikle and Cressie, 1999). Alternatively, one can run the Kalman recursion and recognize

that a byproduct of the Kalman algorithm is that the likelihood can be computed from the filtered values

with little extra effort. That is, if we define an innovation and its covariance as and

respectively. Then, the log likelihood value up to a constant is simply (Shumway

and Stoffer, 2000):

(2)

Thus, we might perform maximum likelihood estimation, either numerically (Gupta and Mehra, 1974) or

by the EM algorithm (Shumway and Stoffer, 1982, 2000). In this paper we focus on the EM algorithm.

Consider as the complete data and denote its likelihood . An EM

iteration consists of two steps: an E-step and an M-step. Given the current value of the parameters, ,

the E-step computes the expected value of the complete data likelihood, which is of the following form

(for details see Shumway and Stoffer, 2000):

(3)

where and

Note and depend on .

In the M-step, an update, is chosen such that . This will

guarantee that the likelihood increases monotonically. When the likelihood function is bounded, the iter-

ates will eventually converge to the MLE. If is also the minimum of (3), we have the standard EM

algorithm. Otherwise, the algorithm is known as General EM (GEM) (McLachlan and Krishnan, 1997).

In the case of there exists a closed form EM update formula for all parameters. Minimizing

(3) with respect to the parameters yields the M-step update formula for our model (Shumway and Stoffer,

1982).

6

7/31/2019 xu_wikle_070705

7/33

(4a)

(4b)

(4c)

(4d)

where

(5)

Note that is not updated, since and are essentially nuisance parameters and they cannot be

estimated simultaneously (Shumway and Stoffer, 1982). We choose to update rather than the covariance

matrix since in general we do not have enough data to justify estimating a covariance matrix (we have

only one observation for the initial vector.)

As mentioned previously, for our applications, the spatially indexed data vector, , is usually high

dimensional. As a result, our parameters are often of high dimension as well. Hence, some form of

dimension reduction is called for. One approach is to parameterize by exploiting the special structure of

the process. We propose several approaches for specifying realistic submodels for and , thereby

substantially easing the burden of estimation.

Our methods rely heavily on GEM algorithms, since we shall see later that in many cases parameters

are not separable, which means the joint best update , such as (4), is not available. It is also

the case that sometimes the analytical closed form best update formula cannot be derived for some of

the parameters. As a result, we often must settle for a better update, which only need ensure that the

likelihood move monotonically. The price to be paid for this generality is that it takes more iterations

to converge than what we would experience with the traditional types of EM estimation for state-space

models. Two of the most useful GEM algorithms are described in the following section.

Although we do not explicitly describe algorithms for obtaining standard error estimates for , they

can be computed in various ways. For example, it is sometimes possible to evaluate the Hessian matrix

after convergence (Shumway and Stoffer, 2000). Alternatively, one may obtain estimates of the standard

error by perturbing the likelihood function (2) and using numerical differentiation (e.g., Shumway and

Stoffer, 2000; Tanner, 1996).

7

7/31/2019 xu_wikle_070705

8/33

2.4 Two General EM (GEM) Algorithms

2.4.1 Expectation-conditional maximization (ECM) algorithm

An ECM algorithm consists of an expectation (E) step and conditional maximization (CM) steps (McLach-

lan and Krishnan, 1997). Sometimes the M-step update is difficult to obtain, so we replace the M-step

with several simple CM-steps. As an example, suppose the parameter of interest consists of two parts, i.e.,

. An ECM algorithm updates the two sub-parameters sequentially or conditionally. That

is, given the current value , we obtain the update via two CM-steps subject to the conditionally

maximizing requirement at each CM-step:

CM-step 1: update with such that ,

CM-step 2: update with such that .

Note . That is, the final update satisfies

, so the likelihood value increases after the final update. Clearly ECM qualifies as a

GEM algorithm. If there is need to further divide the parameters into additional parts, the update simply

takes more CM-steps.

2.4.2 GEM Based on One Newton-Raphson Step

Next, consider , where is a scalar parameter. This is for notational ease and illustration, as

the algorithm described below also works for vector parameters. If the first two derivatives of with

respect to exist in closed form, we can use a procedure called GEM Based on One Newton-Raphson

Step to update (McLachlan and Krishnan, 1997). The update has the form

where and

For sufficiently small, this will guarantee that , so this procedure is a

GEM algorithm. In practice choosing will suffice when near the the minimum (Lange, 1999).

Since this step satisfies the conditionally maximizing requirement, it works well with ECM.

8

7/31/2019 xu_wikle_070705

9/33

2.5 Convergence Criteria

The EM algorithm is said to converge when one of the two following conditions is met (Tanner, 1996):

or

for some small positive , where and are (scalar) elements of . Since the EM

algorithm converges to a stationary point, which can be a saddle point, local minimum, or global minimum

(McLachlan and Krishnan, 1997), it is advisable to check the result with several different starting values.

2.6 Starting Values

To achieve fast convergence, one should choose reasonable starting values. One simple method is to use

moment based estimates. Suppose the data vector are of the same size for all and . It is straight

forward to calculate the sample estimates of its first two moments:

(6a)

(6b)

(6c)

where denotes lag-one covariance estimate.

As a crude guess, assume follows a VAR(1) model. We then use the obtained estimates as starting

values: and . We typically specify the measurement

noise covariance matrix byR

, whereR

is obtained from an assessment of the measuring

instrument or from the estimate of nugget effect from a spatial variogram (e.g., Cressie 1993).

Note that in cases with , the estimates and are not positive definite and one cannot

get an estimate of and as described above. Alternatively, one can fit individual univariate

autoregressive models of order 1 for each spatial location and let and be diagonal matrices with

estimates of the autoregressive parameters and conditional variances on the diagonal, respectively.

9

7/31/2019 xu_wikle_070705

10/33

3 Algorithms for Parameterizations of the Matrix

3.1 White Noise

Without site-specific information about the measurement error process, it is often realistic to assume that

measurement error is independent and identically distributed white noise for all data locations, especially

if the domain of interest is relatively homogeneous. For example, researchers have modeled monthly

temperature in the U.S. corn belt with an iid measurement error for all sites (Wikle et al., 1998). In this

simple case, we reduce the error matrix to a product of a scalar and the identity matrix:

(7)

One can show that a closed form M-step update formula exists for .

Proposition 3.1. The M-step update of for model (7) is

(8)

Proof. Rewrite Equation (3) as

Differentiating with respect to

Setting the above to zero and solving for gives the result. A second-derivative test shows that this is

indeed the minimum.

3.2 Truncated Basis Function Representation

In some cases, the measurement error does depend on spatial location, so assuming iid error is no longer

appropriate. However, if we have some knowledge about the measurement error, say from historical data

or from a reformulation of the measurement equation, then we can incorporate that information into the

model. First, we assume the measurement error covariance is not time dependent, . Now, consider

a basis function expansion for the matrix (Berliner et al., 2000; Harville, 1997):

10

7/31/2019 xu_wikle_070705

11/33

(9)

where are symmetric and idempotent matrices such that and . We

assume that and are all known and the positive scalar, c is the only unknown.

This model makes use of an incomplete matrix decomposition such as an eigenvalue decomposition.

In this case we know the dominant matrix bases for and use the term to represent smaller scale

variability and randomness, to ensure that is positive definite. Estimation of with the EM algorithm

in this case involves a numerical step as suggested by the following theorem.

Proposition 3.2. The M-step update of c for model (9) is

the (positive) root of

where

(10)

and is given by (5).

Proof. First note that is positive definite for any and (Harville, 1997)

where , and From (3) we have

(11)

11

7/31/2019 xu_wikle_070705

12/33

Taking the first derivative with respect to gives

(12)

(13)

The last line follows from and . Collecting terms, we get (10).

In general we cannot find the closed form solution of (10), so we have to resort to numerical methods.

Fortunately, most modern software packages have routines for finding the roots of a function of one vari-

able if the user supplies an initial search bracket. However, if , or in (9), then it can be shown

that (10) is a polynomial in c of degree 3 or 5. Then, we can use standard routines for solving polynomials.

In that case, estimation is fully specified. In the event of multiple positive roots, we simply evaluate (10)

and select the minimum. Alternatively, we could employ one Newton-Raphson step to update .

4 Algorithms for Parameterization of the Matrix

First let us derive the update formula for the unparameterized case since the general case yields a different

result than (4b).

4.1 General Case

The EM update for the general case (but with parameterized ) is given by the following proposition.

Proposition 4.1. The update formula for the general is

(14)

Proof. First note that for any and we have

12

7/31/2019 xu_wikle_070705

13/33

Let and rewrite (3) as a function of

Differentiating with respect to gives

Setting the above to zero and solving for yields the result. The second-derivative test confirms this is a

minimum.

Remark 4.1. When is also not parameterized, which implies is (4a) , then replacing with

in proposition (4.1) will yield (4b), or

4.2 Diagonal Case

Assume that is a diagonal matrix with diagonal elements in the vector :

(15)

Such a model is especially appropriate when the state variable is in the spectral domain, since the state

process elements are often approximately decorrelated in that setting (e.g., Wikle, 2002). The following

proposition gives the update equation for . It is simply the diagonal vector of for the general case as

given in (14).

Proposition 4.2. The update formula of for model (15) is

(16)

where is the diagonal of .

Proof. From (3) we have

Taking partial derivatives with respect to gives

13

7/31/2019 xu_wikle_070705

14/33

Setting the above to zero we obtain

which is true for . The second derivative test confirms that this is a minimum.

4.3 Conditional Autoregressive (CAR) Model

Consider a CAR model for in (1b) by assuming the following (e.g., He and Sun, 2000):

(17)

where if location and are neighbors, and otherwise. Define the adjacency matrix

. It can be shown that the covariance matrix of the joint distribution is (e.g.,

He and Sun, 2000). In other words, the model for is

(18)

Let be the eigenvalues of . Sun et al. (2000) showed that , and that in

order for the covariance matrix to be positive definite, .

Proposition 4.3. The M-step update of and for model (18) is

the root of (19a)

(19b)

where

(20)

and

(21)

Proof. Starting from (3) and using the fact that

14

7/31/2019 xu_wikle_070705

15/33

Taking the first derivative with respect to and , respectively, gives

(22a)

(22b)

Setting (22b) to zero yields

(23)

Substituting (23) into (22a) yields , or (20), the root of which is . Substituting in (23) we

obtain (19b). The second derivative test confirms that this is a minimum.

Note that to find one needs to perform a numerical search on a line. Fortunately, the initial search

bracket is always known, i.e., since has to be constrained as mentioned earlier. Therefore

these update formula are fully specified and their implementation is automatic.

4.4 Exponential Covariogram Model

It is common in spatial statistics to impose a parametric model on the spatial random field. We consider

the commonly used exponential covariogram model:

(24)

where the correlation matrix is governed by the exponential correlation function ,

and is the distance between two locations and is the spatial dependence parameter (Cressie, 1993). It

is important to recognize that there exists an analytical form for the first and second derivative of this

correlation function with respect to . This enables us to obtain the closed form update formula as

given by the following proposition.

Proposition 4.4. The update formula of and for the model (24) is

(25a)

(25b)

15

7/31/2019 xu_wikle_070705

16/33

where

and .

Proof. Starting from (3)

Taking the first derivative with respect to yields

Setting the above to zero and using for yields the update formula (25a). Notice this is an ECM

step. To update , we focus the following function :

Then, we use the GEM algorithm based on one Newton-Raphson Step to obtain (25b).

Remark 4.2. The algorithm given by Proposition 4.4 is appropriate for any covariogram model which

has analytical form of the first and second derivative of the correlation function with respect to the spatial

dependence parameter .

5 Parametrization for Transition Matrix

The transition (or propagator) matrix, , is the most critical part of the spatio-temporal dynamic model (1),

since it governs the evolution of the process. Each row of contains essentially the location-wise weights

applied to the process at the previous time for the spatial location corresponding to that row. It can shown

that the M-step update formula for the unparameterized is Equation (4a) regardless of parameterization

16

7/31/2019 xu_wikle_070705

17/33

of and . Here we propose a simple but powerful parameterization. Assume that an entry of is

either zero or , . The position of the s and the s in the matrix are fixed. We can write

(26)

where and is known.

In the next theorem we derive the closed form update formula for .

Proposition 5.1. The iteration update of for model (26) is

where

and

Proof. Starting from Equation (3),

Defining and differentiating with respect to gives

Setting the above to zero and using the fact that , we get

Therefore, we have linear equations. Writing in the matrix form and solving for gives the

result.

The formula in Proposition (5.1) implies that to update we need to know . This is impossible

with the standard EM algorithm. Instead we employ the ECM algorithm to update the parameters sequen-

tially (see Section 2.4). To illustrate, suppose is defined in the general case. Then, we can update and

together as suggested by the following remark.

Remark 5.1. The ECM update for model (26) and general is

1. first update with Proposition (5.1) by letting

2. then update with Proposition (4.1) by letting .

17

7/31/2019 xu_wikle_070705

18/33

6 Illustrative Examples

6.1 Advection-Diffusion PDE: A Simulation Study

6.1.1 Background

In an ecological study, researchers used a diffusion PDE to predict the spread of the house finch in the

eastern United States with a hierarchical Bayesian model (Wikle, 2003). In cases where one does not have

strong a priori belief that the diffusion parameter varies with space, such a model lends itself well to the

methodology we just developed. For illustration, consider a 1-dimensional advection-diffusion equation

for the spatio-temporal state process , at spatial location and time :

(27)

where is the advection coefficient and is the diffusion coefficient. Following basic finite difference

approaches to the numerical solution of partial differential equations (e.g., Haberman, 1987), we can apply

first-order forward differences in time ( ) and centered differences in space

( and ),

where these centered differences are valid for any time , to get

(28)

where and are the temporal and spatial increments, respectively, and we thus discretize the spatial

domain so that have equal spacing, . Note was added to (28) to make up for

the loss from discretization and to introduce extra stochastic forcing (Wikle, 2003). Furthermore, if we let

, (since spatial locations are equally

spaced, this is convenient notatially) and , then (28) becomes

(29)

Writing in matrix form, we have

(30)

18

7/31/2019 xu_wikle_070705

19/33

where is the interior process and is the boundary

process, respectively. Furthermore,

...... (31)

is the propagator matrix for the boundary process, and

......

.... . .

......

... (32)

is a tri-diagonal propagator matrix for the interior process. This matrix is in the form of (26) due to the

structural zeros.

For simplicity, we let the boundaries in (30) be zeros (i.e., for all . Thus, (30) be-

comes the usual state equation. In addition, we specify a measurement equation similar to (1a) with noise

covariance matrix (7). We have specified a spatio-temporal dynamic model for the diffusion PDE process:

(33a)

(33b)

where cov . We assume that the process error, , has an exponential covariance matrix

as described by (24), i.e., cov We assume both noise processes are Gaussian with

zero-mean.

Of course, in most real-world applications in which an advection-diffusion process would be ap-

propriate (e.g., processes in the atmosphere, ocean, or ecological processes) one would not know the

parameters and (and thus, , and ). Thus, we seek to estimate them. We demonstrate such

estimation by simulating the process and comparing estimates to the known parameters.

6.1.2 Simulating the Data Set

To illustrate our estimation methodology, we simulate a data set according to the above specified model.

Table 1 summarizes the actual value of the parameters and other simulation set-up values. As a way of

19

7/31/2019 xu_wikle_070705

20/33

Parameter Set-up

SNR Missing n T

.3 .6 .1 1 1 5 5 10 % 18 20 100

Table 1: Simulation setup for diffusion data used in Table 2 and Figure 1. ( Signal-to-Noise Ratio (SNR)

, = number of observation at time , spatial dimension of the state process, spatial

dependence parameter, parameters in the propagator matrix.)

gauging the estimation performance, we withhold a certain amount of data for validation. This missing

data set-up is achieved easily with an incidence matrix in the measurement equation.

Figure 1 (a) shows the map of the simulated spatio-temporal diffusion data. There is a noticeable

pattern of propagation of spatial features to the left through time. This is the result of the special structure

of the propagation matrix and the chosen value of . Another way of looking at the data is to

examine the time series plot for some locations (see Figure 2 ). We cannot easily detect spatio-temporal

propagation from such time series plots, but they give us information about the temporal structure and

stability of the signal.

6.1.3 ECM estimation

Estimation is carried out with an ECM algorithm. For this model we need to update the following pa-

rameters: . Given the current iterate , five CM-steps are used to obtain the

update :

CM-step 1: update by Proposition 5.1 with , and,

CM-step 2: update by Equation (25a) Proposition 4.4 with , ,

and ,

CM-step 3: update by Equation (25b) in Proposition 4.4 with , ,

and ,

CM-step 4: update by Proposition 3.1 with , and ,

CM-step 5: update by Equation (4d) with , and .

These parameters can be updated in a different order if desired. To update , we use the GEM Based on

One Newton-Raphson Step algorithm discussed in Section 2.4 (McLachlan and Krishnan, 1997) .

20

7/31/2019 xu_wikle_070705

21/33

(b) Est

Location

10 20

10

20

30

40

50

60

70

80

90

100

10 0 10

(c) Truth

Location

10 20

10

20

30

40

50

60

70

80

90

100

10 0 10

(d) TruthEst

Location

10 20

10

20

30

40

50

60

70

80

90

100

10 0 10

(e) Std

Location

10 20

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5

(a) Data

Location

Time

10 20

10

20

30

40

50

60

70

80

90

100

10 0 10

Figure 1: Simulated diffusion data and estimation: (a) Simulated data, ; see Table 1 for true parameter

values, (b) Smoothed value , (c) true process , (d) prediction error , (e) standard deviation of

prediction error .

21

7/31/2019 xu_wikle_070705

22/33

7/31/2019 xu_wikle_070705

23/33

1 13766.03 0.00 0.30 0.30 0.30 0.2000 1.00 1.00 1.00 6.66 0.00000

2 2825.46 10940.57 0.33 0.53 0.20 1.8420 5.51 1.14 1.00 22.73 28.69474

3 2643.62 181.83 0.33 0.59 0.14 2.6336 6.57 1.33 1.00 22.43 9.18374

4 2612.22 31.41 0.32 0.62 0.11 2.8971 6.81 1.54 1.00 22.21 4.57350

5 2597.93 14.28 0.31 0.63 0.09 2.9249 6.74 1.79 1.00 22.12 2.63997

61 2492.28 0.03 0.28 0.61 0.10 1.1671 4.78 9.13 1.00 26.67 0.03425

121 2491.71 0.00 0.28 0.61 0.10 0.9806 5.00 9.46 1.00 27.07 0.00839

181 2491.64 0.00 0.28 0.61 0.10 0.9196 5.08 9.56 1.00 27.19 0.00328

241 2491.62 0.00 0.28 0.60 0.10 0.8937 5.11 9.61 1.00 27.24 0.00148

265 2491.62 0.00 0.28 0.60 0.10 0.8878 5.12 9.62 1.00 27.25 0.00110

266 2491.62 0.00 0.28 0.60 0.10 0.8876 5.12 9.62 1.00 27.25 0.00108

267 2491.62 0.00 0.28 0.60 0.10 0.8874 5.12 9.62 1.00 27.25 0.00107

268 2491.62 0.00 0.28 0.60 0.10 0.8871 5.12 9.62 1.00 27.25 0.00106

269 2491.62 0.00 0.28 0.60 0.10 0.8869 5.12 9.62 1.00 27.25 0.00105

270 2491.62 0.00 0.28 0.60 0.10 0.8867 5.12 9.62 1.00 27.25 0.00103

271 2491.62 0.00 0.28 0.60 0.10 0.8865 5.12 9.62 1.00 27.25 0.00102

272 2491.62 0.00 0.28 0.60 0.10 0.8863 5.12 9.62 1.00 27.25 0.00101

Truth 0.30 0.60 0.10 1.0000 5.00 10.00

Table 2: ECM iterates for simulated diffusion data. is the

Newton-Raphson step size for updating . We consider the stopping criterion . See Figure 1 for

a plot of the data.

23

7/31/2019 xu_wikle_070705

24/33

estimates. As shown in Table 3, the estimates are generally centered around the true value with small

deviances. Several findings are worth noting. First, the estimates of the s are not sensitive to the amount

of missing data, yet the variance parameter estimates are more uncertain if the amount of missing data

is large. Second, and not surprisingly, large SNR values yield more accurate estimates for most of the

parameters except for the measurement noise variance .The findings from this simulation study do not necessarily generalize. However, they do give us a

picture of how this estimation procedure might perform in practice for a process that is very realistic in

many environmental applications.

6.2 Palmer Drought Severity Index (PDSI)

6.2.1 Background

Drought poses a serious problem for every society. One measure of drought is the Palmer Drought Severity

Index (PDSI), which is typically a monthly valued index (Heim, 2002). The typical value of PDSI ranges

from -6 to +6 with negative values denoting dry spells and positive values denoting wet spells.

We obtain the monthly PDSI for 107 locations in the central U.S. from January 1900 to December

1997. Figure 3 displays the data for two typical months. We can see that there is significant spatial

correlation in the data. Indeed, dry and wet spells occur with substantial spatial coherence across the

region. Therefore there is no need to model 107 stations individually. A more concise representation

should suffice for this data set. Thus, we consider a dimension reduced spatio-temporal approach to model

PDSI.

6.2.2 Dimension reduction

First, we introduce the idea of spatio-temporal dimension reduction. The key is to recast the state vector

in a much lower dimensional space by using a spectral basis (Wikle and Cressie, 1999). Let

(34)

where is an matrix of spectral basis function, , and contains the residual

process induced by the truncation. We treat as our new state vector, which follows a first-order Markov

process if follows such a process. Typically, is a non-dynamic (uncorrelated in time) spatial process.

Now, rewrite the spatio-temporal model (1) in the light of dimension reduction,

(35a)

24

7/31/2019 xu_wikle_070705

25/33

7/31/2019 xu_wikle_070705

26/33

105 100 95 9030

32

34

36

38

40

42

44

46

48

50

PDSI 7/1988

105 100 95 9030

32

34

36

38

40

42

44

46

48

50

PRED 7/1988

105 100 95 9030

32

34

36

38

40

42

44

46

48

50

PDSI 7/1993

105 100 95 9030

32

34

36

38

40

42

44

46

48

50

PRED 7/1993

Figure 3: PDSI data (left column) and prediction (right column) for two months. Dark (open) circle

corresponds to negative (positive) PDSI value. Size of the circle is proportional to the magnitude of the

PDSI value.

26

7/31/2019 xu_wikle_070705

27/33

(35b)

where contains measurement error as well as truncation error , and as is typical, we assume that

and are mean zero Gaussian processes that are temporally independent. This model is in the same form

as (1) with slightly different notation. We proceed to specify the covariance of after taking into account

the truncation error:

(36)

This is a simplified version of a formulation by Berliner et al. (2000). By using the next basis functions

, this formulation amounts to a second dimension reduction. If are orthonormal, then it is evident

that (36) is an example of model (9) with . We assume the covariance matrix of the model error

process is diagonal, i.e., . This is reasonable since the spectral decomposition typically leads

to decorrelation in spectral space.

If one is interested in predicting the process at locations for which one does not have data, then it

is important to consider explicitly as suggested by Wikle and Cressie (1999) and Cressie and Wikle

(2002). However, if one is primarily interested in the dynamic process and/or its parameters ( )

then it is simpler to consider through its marginal covariance ( in this case). This is directly anal-

ogous to traditional mixed models where if one is interested in inference on the fixed effects, then one

integrates out the random effects and considers the so-called marginal formulation. However, if one is

interested in predicting the random effects then one considers the random effects directly (the so-called

conditional specification). In this application, we are interested in forecasting the dynamic component so

it is reasonable to consider the effects of marginally as indicated above.

Although we could use any set of orthonormal basis functions (e.g., Fourier, wavelets, empirical),

we choose to use EOFs (empirical orthogonal functions) for this example, since they are widely used in

meteorological studies. EOFs are meteorologists name for the familiar principal components analysis

for spatio-temporal data (see Wikle, 1996 for an overview). We obtain the EOFs, , by performing

an eigenvalue decomposition of the estimated spatial covariance matrix of the data. Figure 4 shows the

percent variability accounted for by each basis function. Notice the steep decline up to the 10th EOF. The

remaining EOFs explain very little of the variability in the data. Therefore, we fix the truncation parameter

at 10. Instead of modeling a 107-dimensional state vector, we now model a 10-dimensional state vector,

which is a much easier task both statistically and computationally. Finally, we choose in (36) since

the next 20 EOFs account for about 10% of the variability and adding more EOFs does not add much

27

7/31/2019 xu_wikle_070705

28/33

1 10 30 50 70 90 1070

10

20

30

40

50

60

70

80

90

100

Percent

EOF ID

individual

cummulative

Figure 4: Percentages (solid line) and accumulated percentages (dashed line) of total variability accounted

for by EOFs for the PDSI data.

spatial structure.

6.2.3 EM Estimation

The parameters for model (35) are and . The M-step update formulas are Equation (4a), Propo-

sition (4.2), Equation (4d) and Proposition (3.2), respectively. The EM iteration history is shown in Table

4. The algorithm converges much faster than the ECM example discussed in Section 6.1.

To assess the fit of the model, we examine the plots of predicted values of the state vector in the original

space after making the transform, Figure 3 shows the one-month ahead prediction along

with the observed data for each of two months. As we can see, the predictions are able to capture the major

spatial pattern fairly well. Figure 6 shows time series plots for three stations for a 10-year period around

the major Midwest flood year of 1993. In general, the model does a reasonable job of predicting the next

months PDSI values even amidst the unusual the flood event. The prediction standard error map (Figure 5)

shows the prediction error is roughly of the same order across the spatial domain and for different time

periods.

28

7/31/2019 xu_wikle_070705

29/33

7/31/2019 xu_wikle_070705

30/33

1060 1080 1100 1120 1140 1160 1180

5

0

5

lon 102.30, lat 48.50

1060 1080 1100 1120 1140 1160 1180

5

0

5

lon 95.90, lat 39.60

1060 1080 1100 1120 1140 1160 1180

5

0

5

lon 92.30, lat 31.20

month

Figure 6: PDSI data (solid line) and prediction, (dashed line) for three stations.

7 Summary and Conclusion

We have proposed several parameterizations for a spatio-temporal dynamic model. The strategy is to make

use of valid spatial statistical models and to make simple physical assumptions about the process which

lead to partial restrictions on the transition matrix and/or the covariance matrices. We also derive the

relevant (General) EM update formulas for these restrictions. We demonstrate this methodology with a

simulation study in which the true state-process follows an advection-diffusion process. In addition, we

apply the methodology to the problem of spatio-temporal modeling of monthly Palmer Drought Severity

Index values over the central U.S.

It is important to point out that although this development was motivated by spatio-temporal problems,

the parameterization/GEM approach is quite flexible and the ideas contained in this paper can be used for

other parameterizations as well. In particular, the state-space framework is useful for many multivariate

time series problems (e.g., see applications in Shumway, 1988). However, the fact that we prefer closed

form update formulas does put a limit on our choices for parameterizations. In addition, since there could

be different parameterizations for the same data that are reasonable from a scientific perspective, it is

desirable to have a consistent way of performing model selection (e.g., Bengtsson 2000).

30

7/31/2019 xu_wikle_070705

31/33

It is reasonable to ask what are the advantages and disadvantages of the EM/GEM approach presented

here compared to a fully-Bayesian approach (e.g., Wikle 2003; Berliner et al. 2000). In many respects

the EM/GEM approach can be thought of as an empirical Bayesian approach, whereby the data are used

to estimate the lower level parameters in a hierarchical Bayesian model. In the spatio-temporal dynamical

model framework, the fully Bayesian approach is most useful when one has some prior understanding (ei-ther from scientific theory, or from previous empirical studies) about the process dynamics, particularly the

evolution operator (propagator). For example, Wikle (2003) considered a discretized PDE-based model

analogous to the advection-diffusion example in Section 6. However, in that case the ecological problem at

hand suggested that the dynamics were largely controlled by spatially-varying (yet unknown) diffusion co-

efficients that corresponded to population spread. In that context, it was appropriate, given the relationship

between heterogeneous population spread and habitat, that the diffusion parameters be spatially dependent

and should thus have a spatial prior distribution. However, for a process such as example 6.1, in which one

does not expect the advection and diffusion coefficients to be spatially-varying, then it is certainly reason-

able to assume no spatial dependence and thus estimate the relatively few parameters empiricallly through

the EM/GEM approach outlined here. In summary, when the model complexity increases, and/or when

one has significant prior knowledge about the dynamics, one should use a fully-Bayesian approach. When

one has a relatively simple model and little prior knowledge, then use the EM/GEM approach. However,

the EM/GEM approach is of limited utility if the process is high-dimensional. Thus, one must parameter-

ize the dynamics in this case to reduce the effective number of parameters. This is the approach we have

outlined here.

References

Bengtsson, T. (2000). Time series discrimination, signal comparison testing, and model selection in the

state-space framework. Ph.D. thesis, University of Missouri-Columbia.

Berliner, L. M., Wikle, C. K., and Cressie, N. (2000). Long-lead prediction of pacific SST via Bayesian

dynamic modeling. Journal of Climate, 13, 39533968.

Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance

functions. Journal of the American Statistical Association, 94(448), 13301340.

31

7/31/2019 xu_wikle_070705

32/33

Cressie, N. and Wikle, C. (2002). Space-time kalman filter. In A. El Shaarawi and W. Piegorsch, editors,

Encyclopedia of Environmetrics, volume 4, pages 20452049, New York. Wiley.

Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York, revised edition.

Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data. Journal of the

American Statistical Association, 97, 590600.

Gupta, N. and Mehra, R. (1974). Computational aspects of maximum likelihood estimation and reduction

in sensitivity function calculations. IEEE Transactions on Automatic Control, 19, 774783.

Haberman, R. (1987). Elementary Applied Partial Differential Equations, Second Edition. Prentice-Hall,

Inc., New Jersey.

Harville, D. A. (1997). Matrix Algebra from a Statisticians Perspective. Springer, New York.

He, Z. and Sun, D. (2000). Hierarchical bayes estimation of hunting success rates with spatial correlations.

Biometrics, 56, 360367.

Heim Jr., R. R. (2002). A review of twentieth-century drought indices used in the united states. Bulletin

of American Meteorological Society, 83, 11491165.

Huerta, G., Sanso, B., and Stroud, J. (2004). A spatiotemporal model for mexico city ozone levels. Journal

of the Royal Statistical Society, Series C, 53, 231248.

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic

Engineering, 82(D), 3545.

Kyriakidis, P. C. and Journel, A. G. (1999). Geostatistical space-time models: a review. Mathematical

Geology, 31(6), 651684.

Lange, K. (1999). Numerical Analysis for Statisticians. Springer, New York.

Mardia, K., Goodall, C., Redfern, E., and Alonso, F. (1998). The kriged kalman filter (with discussion).

Test, 7, 217285.

McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.

Shumway, R. (1988). Applied Statistical Time Series Analysis. Prentice Hall, Englewood Cliffs, NJ.

32

7/31/2019 xu_wikle_070705

33/33

Shumway, R. H. and Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using

the EM algorithm. Journal of Time Series Analysis, 3(4), 253264.

Shumway, R. H. and Stoffer, D. S. (2000). Time Series Analysis and Its Applications. Springer, New York.

Stroud, J., Mueller, P., and Sanso, B. (2001). Dynamic models for spatio-temporal data. Journal of the

Royal Statistical Society, Series B, 63, 673689.

Sun, D., Tsutakawa, R. K., and Speckman, P. L. (2000). Bayesian inference for CAR (1) models with

noninformative priors. Biometrika, 86, 341350.

Tanner, M. A. (1996). Tools for statistical inference : methods for the exploration of posterior distributions

and likelihood functions. Springer, New York.

Wikle, C. K. (1996). Spatio-temporal statistical models with applications to atmospheric processes. Ph.D.

thesis, Iowa State University.

Wikle, C. K. (2002). Spatial modeling of count data: A case study in modelling breeding bird survey data

on large spatial domains. In A. B. Lawson and D. G. T. Denison, editors, Spatial Cluster Modelling,

pages 199209. Chapman and Hall.

Wikle, C. K. (2003). Hierarchical Bayesian models for predicting the spread of ecological processes.

Ecology, 84, 13821394.

Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach to space-time kalman filtering.

Biometrika, 86(4), 815829.

Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. Journal

of Environmental and Ecological Statistics, 5, 117154.

Wikle, C. K., Milliff, R., Nychka, D., and Berliner, L. (2001). Spatiotemporal hierarchical Bayesian

modeling: Tropical ocean surface winds. Journal of the American Statistical Association, 96, 382397.

33

Documents

xu_wikle_070705