Upload
lu-san
View
215
Download
0
Embed Size (px)
Citation preview
7/31/2019 xu_wikle_070705
1/33
Estimation of Parameterized Spatio-Temporal Dynamic Models
Ke Xu and Christopher K. Wikle
Department of Statistics, University of Missouri-Columbia
July 7, 2005: Original Submission July 21, 2004
Abstract
Spatio-temporal processes are often high-dimensional, exhibiting complicated variability across
space and time. Traditional state-space model approaches to such processes in the presence of uncer-
tain data have been shown to be useful. However, estimation of state-space models in this context is of-
ten problematic since parameter vectors and matrices are of high dimension and can have complicated
dependence structures. We propose a spatio-temporal dynamic model formulation with parameter
matrices restricted based on prior scientific knowledge and/or common spatial models. Estimation is
carried out via the expectation-maximization (EM) algorithm or general EM algorithm. Several param-
eterization strategies are proposed and analytical or computational closed form EM update equations
are derived for each. We apply the methodology to a model based on an advection-diffusion partial
differential equation in a simulation study and also to a dimension-reduced model for a Palmer Drought
Severity Index (PDSI) data set.
Keywords: Dynamic, EM algorithm, General EM, state-space, time series, spatial, spatio-temporal
Corresponding Author: Christopher K. Wikle, Department of Statistics, University of Missouri, 146 Math Science Building,
Columbia, MO 65211; 573-882-9659; fax: 573-884-5524; [email protected]
1
7/31/2019 xu_wikle_070705
2/33
1 Introduction
Spatio-temporal statistical models are essential tools for performing inference and prediction for processes
in the physical, environmental, and biological sciences. Such processes are often complicated in that the
dependence structure across space and time is non-trivial, often non-separable and non-stationary in space
or time. In addition, it is often the case that the number of spatial locations at which inference is desired
is quite large. Furthermore, data are often collected with substantial observational uncertainty and it is not
uncommon to have missing observations at various spatial and temporal locations.
Various approaches have been proposed to model spatio-temporal processes (e.g., see Kyriakidis and
Journel, 1999 for a review). If one considers time as an extra dimension, then traditional spatial statistics
techniques can be applied (Cressie, 1993). However, such approaches ignore the fundamental differences
between space and time, principally that time is naturally ordered and space is not. Alternatively, one can
consider the spatio-temporal problem from a multivariate geostatistical perspective which requires space-time covariance functions be specified. Traditionally this approach has been limited in that the known
class of valid spatio-temporal covariance functions is quite small, although in recent years, several authors
have extended this class of functions (e.g., Cressie and Huang, 1999; Gneiting, 2002). Nevertheless, this
approach is still limited by the fact that such covariance functions are often not realistic for complicated
dynamical processes and dimensionality can prohibit practical implementation.
Spatio-temporal processes can also be considered from the multiple time series perspective (e.g., see
Kyriakidis and Journel, 1999 for a review). That is, each spatial location is associated with a time series.
Then, multivariate time series techniques can be transferred to the space-time problem. However, such
approaches ignore the fundamental differences between space and time and ones ability to predict at
locations for which data were not observed is limited. Such approaches do not in general explicitly account
for uncertainty in the observed data. Perhaps more critically, such methods are difficult to implement in
cases where the dimensionality of the state vector (i.e., the number of spatial locations) is high.
A natural approach to spatio-temporal modeling for complex dynamical processes is a combination
of spatial and time series techniques, which is accomplished by a spatio-temporal dynamic model formu-
lation (e.g., see Cressie and Wikle, 2002 for a brief review). However, estimation in this context can beproblematic due to the high dimensionality of the state process. Several modeling strategies have been pro-
posed to address this problem. One approach is to reduce dimensionality by projecting the state-process on
some set of spectral basis functions (e.g., Mardia, Goodall, Redfern and Alonso, 1998; Wikle and Cressie,
2
7/31/2019 xu_wikle_070705
3/33
1999). Alternatively, one might specify very simple, random walk dynamics (e.g., Stroud, Mueller and
Sanso, 2001; Huerta, Sanso and Stroud, 2004). Another approach is to incorporate physical or biological
models directly into the parametrization (Wikle, Milliff, Nychka, and Berliner, 2001; Wikle, 2003). Even
in the case of physically or biologically motivated dynamic models, it is seldom the case for statistical
problems (unlike some engineering problems) that we know explicitly the model parameters. These mustbe estimated, but in the presence of known constraints on the dynamical formulation. Furthermore, due
to the high dimensionality, measurement error and process covariance matrices typically have too many
parameters to estimate outright. Thus, these matrices must be parameterized as well.
Estimation in the spatio-temporal dynamical model setting is best accomplished through a state-space
framework. Given parameters, the unobserved state-process can be estimated via the Kalman filter or
Kalman smoother (e.g., see Cressie and Wikle, 2002 for a review). However, in the more usual setting
where model parameters are unknown, the standard approach following Shumway and Stoffer (1982) is to
use the expectation-maximization (EM) algorithm to estimate parameters. As mentioned above, the spatio-
temporal problem typically requires restrictions on the parameter matrices. Shumway and Stoffer (1982)
discuss modifications to their algorithm to accommodate fully-restricted parameter matrices. However, it
is not clear how they account for partially restricted or parameterized model matrices in this framework.
Examination of Shumway (1988) (pp. 323-332) implies that one approach to deal with partially restricted
parameter matrices is to set initial parameters (in the EM algorithm) to agree with the known values. Then
in the M-step, only those parameters that require estimation are updated, so that the fixed parameters
dont change. Alternatively, one can update all parameters but then immediately impute the known values
for the fixed parameters. Although these approaches are relatively easy to implement, it is not clear that
they give the maximum likelihood estimates under the state-space model assumptions. Another approach,
considered here, is to develop general EM (GEM) algorithms to account directly for the restricted or
partially restricted model matrices.
In this paper we describe efficient estimation approaches for spatio-temporal dynamic models in which
the parameter matrices and/or noise covariance matrices are highly parameterized (or restricted). We
utilize GEM algorithms to carry out this estimation. In Section 2, we give some necessary background for
spatio-temporal dynamic models and GEM algorithms. In Section 3, 4 and 5, we propose several methods
of parameterization and derive the Expectation-Maximization (EM) update formula for each. In Section 6,
we consider two examples. Finally, Section 7 contains a brief summary and conclusion.
3
7/31/2019 xu_wikle_070705
4/33
2 Background
2.1 Spatio-Temporal Dynamic Model Formulation
Let be an vector containing the data values at spatial locations,
, at time . Let be an vector for an unobservable spatio-temporal state
process at some fixed network of locations at time . This state process is our primary interest.
The two sets of spatial locations, , where is some domain in , need not be the same. Write
(1a)
(1b)
for , where (1a) is called the measurement equation and (1b) the state equation. Let be a
known matrix that maps the data to the process . The measurement noise is zero-mean,
uncorrelated in time and Gaussian with covariance matrix . The dynamics are described by the
state equation (1b) via a first-order Markov process with the transition or propagator matrix . We also
assume there are shocks to the system, which are spatially colored, temporally white and Gaussian with
mean zero and a common covariance matrix . For completeness, we assume the process starts
with , which is a Gaussian spatial process with mean and covariance matrix . Such a model
in the spatio-temporal context is not new, nor is it the most general. However, such models have received
a considerable amount of attention in the environmental literature in recent years and have been shown to
be quite effective (e.g. see the review in Cressie and Wikle, 2002).
The parameters for the model (1) are The major challenge in fitting this
model lies in the high dimensionality of most space-time applications. For example, a model for Pacific
Sea Surface Temperature (SST) might have (Berliner, Wikle and Cressie, 2000), which
requires estimation of a matrix . Often, researchers resort to Bayesian hierarchical ap-
proaches for dealing with this dimensionality problem by considering restrictions to , assigning priors
and then using MCMC (Wikle, Berliner and Cressie, 1998). This paper shows that it is often possible to fit
such models and estimate their parameters via the convenient Kalman and EM algorithms. Again, the key
is to assign a structure to the model parameters. Though still limited for many problems, such a formula-
tion is useful in many settings. For example, in the early stage of the model building, one might consider
such an implementation, since it is fast and easy to implement when compared to the MCMC. In addition,
there are situations where there is little scientific theory or previous empirical evidence to suggest prior
4
7/31/2019 xu_wikle_070705
5/33
parameterizations for a fully-Bayesian model. In these cases, if the model is sufficiently parameterizaed,
the KF/EM approach is a reasonable alternative.
2.2 Kalman Filter and Smoother
Suppose we know the value of all parameters, , then one can use a set of recursions known as the
Kalman Filter and Kalman Smoother to obtain the conditional mean and covariance of the state variable,
(Kalman, 1960; Shumway and Stoffer, 1982). These recursions are well-known but we present them
here to define notation and for completeness. Our overview follows Shumway and Stoffer (2000) with
various notational modifications. First, define the conditional mean In particular,
and are called the predicted, filtered and smoothed values, respectively. Also define the
conditional variance covariance matrix, var and lag-one covariance matrix,
cov
To get predicted and filtered values, one evaluates the following set of recursions for ,
which is called the Kalman Filter:
and where and are specified. To get smoothed values, one runs the following backward recursion
for , which is sometimes called the Kalman Smoother:
To get the smoothed lag-one covariance, one runs the backward recursion for on
where
5
7/31/2019 xu_wikle_070705
6/33
2.3 EM Estimation
One can estimate the parameters by the method of moments and then plug them in (1) to implement the
Kalman filter (Wikle and Cressie, 1999). Alternatively, one can run the Kalman recursion and recognize
that a byproduct of the Kalman algorithm is that the likelihood can be computed from the filtered values
with little extra effort. That is, if we define an innovation and its covariance as and
respectively. Then, the log likelihood value up to a constant is simply (Shumway
and Stoffer, 2000):
(2)
Thus, we might perform maximum likelihood estimation, either numerically (Gupta and Mehra, 1974) or
by the EM algorithm (Shumway and Stoffer, 1982, 2000). In this paper we focus on the EM algorithm.
Consider as the complete data and denote its likelihood . An EM
iteration consists of two steps: an E-step and an M-step. Given the current value of the parameters, ,
the E-step computes the expected value of the complete data likelihood, which is of the following form
(for details see Shumway and Stoffer, 2000):
(3)
where and
Note and depend on .
In the M-step, an update, is chosen such that . This will
guarantee that the likelihood increases monotonically. When the likelihood function is bounded, the iter-
ates will eventually converge to the MLE. If is also the minimum of (3), we have the standard EM
algorithm. Otherwise, the algorithm is known as General EM (GEM) (McLachlan and Krishnan, 1997).
In the case of there exists a closed form EM update formula for all parameters. Minimizing
(3) with respect to the parameters yields the M-step update formula for our model (Shumway and Stoffer,
1982).
6
7/31/2019 xu_wikle_070705
7/33
(4a)
(4b)
(4c)
(4d)
where
(5)
Note that is not updated, since and are essentially nuisance parameters and they cannot be
estimated simultaneously (Shumway and Stoffer, 1982). We choose to update rather than the covariance
matrix since in general we do not have enough data to justify estimating a covariance matrix (we have
only one observation for the initial vector.)
As mentioned previously, for our applications, the spatially indexed data vector, , is usually high
dimensional. As a result, our parameters are often of high dimension as well. Hence, some form of
dimension reduction is called for. One approach is to parameterize by exploiting the special structure of
the process. We propose several approaches for specifying realistic submodels for and , thereby
substantially easing the burden of estimation.
Our methods rely heavily on GEM algorithms, since we shall see later that in many cases parameters
are not separable, which means the joint best update , such as (4), is not available. It is also
the case that sometimes the analytical closed form best update formula cannot be derived for some of
the parameters. As a result, we often must settle for a better update, which only need ensure that the
likelihood move monotonically. The price to be paid for this generality is that it takes more iterations
to converge than what we would experience with the traditional types of EM estimation for state-space
models. Two of the most useful GEM algorithms are described in the following section.
Although we do not explicitly describe algorithms for obtaining standard error estimates for , they
can be computed in various ways. For example, it is sometimes possible to evaluate the Hessian matrix
after convergence (Shumway and Stoffer, 2000). Alternatively, one may obtain estimates of the standard
error by perturbing the likelihood function (2) and using numerical differentiation (e.g., Shumway and
Stoffer, 2000; Tanner, 1996).
7
7/31/2019 xu_wikle_070705
8/33
2.4 Two General EM (GEM) Algorithms
2.4.1 Expectation-conditional maximization (ECM) algorithm
An ECM algorithm consists of an expectation (E) step and conditional maximization (CM) steps (McLach-
lan and Krishnan, 1997). Sometimes the M-step update is difficult to obtain, so we replace the M-step
with several simple CM-steps. As an example, suppose the parameter of interest consists of two parts, i.e.,
. An ECM algorithm updates the two sub-parameters sequentially or conditionally. That
is, given the current value , we obtain the update via two CM-steps subject to the conditionally
maximizing requirement at each CM-step:
CM-step 1: update with such that ,
CM-step 2: update with such that .
Note . That is, the final update satisfies
, so the likelihood value increases after the final update. Clearly ECM qualifies as a
GEM algorithm. If there is need to further divide the parameters into additional parts, the update simply
takes more CM-steps.
2.4.2 GEM Based on One Newton-Raphson Step
Next, consider , where is a scalar parameter. This is for notational ease and illustration, as
the algorithm described below also works for vector parameters. If the first two derivatives of with
respect to exist in closed form, we can use a procedure called GEM Based on One Newton-Raphson
Step to update (McLachlan and Krishnan, 1997). The update has the form
where and
For sufficiently small, this will guarantee that , so this procedure is a
GEM algorithm. In practice choosing will suffice when near the the minimum (Lange, 1999).
Since this step satisfies the conditionally maximizing requirement, it works well with ECM.
8
7/31/2019 xu_wikle_070705
9/33
2.5 Convergence Criteria
The EM algorithm is said to converge when one of the two following conditions is met (Tanner, 1996):
or
for some small positive , where and are (scalar) elements of . Since the EM
algorithm converges to a stationary point, which can be a saddle point, local minimum, or global minimum
(McLachlan and Krishnan, 1997), it is advisable to check the result with several different starting values.
2.6 Starting Values
To achieve fast convergence, one should choose reasonable starting values. One simple method is to use
moment based estimates. Suppose the data vector are of the same size for all and . It is straight
forward to calculate the sample estimates of its first two moments:
(6a)
(6b)
(6c)
where denotes lag-one covariance estimate.
As a crude guess, assume follows a VAR(1) model. We then use the obtained estimates as starting
values: and . We typically specify the measurement
noise covariance matrix byR
, whereR
is obtained from an assessment of the measuring
instrument or from the estimate of nugget effect from a spatial variogram (e.g., Cressie 1993).
Note that in cases with , the estimates and are not positive definite and one cannot
get an estimate of and as described above. Alternatively, one can fit individual univariate
autoregressive models of order 1 for each spatial location and let and be diagonal matrices with
estimates of the autoregressive parameters and conditional variances on the diagonal, respectively.
9
7/31/2019 xu_wikle_070705
10/33
3 Algorithms for Parameterizations of the Matrix
3.1 White Noise
Without site-specific information about the measurement error process, it is often realistic to assume that
measurement error is independent and identically distributed white noise for all data locations, especially
if the domain of interest is relatively homogeneous. For example, researchers have modeled monthly
temperature in the U.S. corn belt with an iid measurement error for all sites (Wikle et al., 1998). In this
simple case, we reduce the error matrix to a product of a scalar and the identity matrix:
(7)
One can show that a closed form M-step update formula exists for .
Proposition 3.1. The M-step update of for model (7) is
(8)
Proof. Rewrite Equation (3) as
Differentiating with respect to
Setting the above to zero and solving for gives the result. A second-derivative test shows that this is
indeed the minimum.
3.2 Truncated Basis Function Representation
In some cases, the measurement error does depend on spatial location, so assuming iid error is no longer
appropriate. However, if we have some knowledge about the measurement error, say from historical data
or from a reformulation of the measurement equation, then we can incorporate that information into the
model. First, we assume the measurement error covariance is not time dependent, . Now, consider
a basis function expansion for the matrix (Berliner et al., 2000; Harville, 1997):
10
7/31/2019 xu_wikle_070705
11/33
(9)
where are symmetric and idempotent matrices such that and . We
assume that and are all known and the positive scalar, c is the only unknown.
This model makes use of an incomplete matrix decomposition such as an eigenvalue decomposition.
In this case we know the dominant matrix bases for and use the term to represent smaller scale
variability and randomness, to ensure that is positive definite. Estimation of with the EM algorithm
in this case involves a numerical step as suggested by the following theorem.
Proposition 3.2. The M-step update of c for model (9) is
the (positive) root of
where
(10)
and is given by (5).
Proof. First note that is positive definite for any and (Harville, 1997)
where , and From (3) we have
(11)
11
7/31/2019 xu_wikle_070705
12/33
Taking the first derivative with respect to gives
(12)
(13)
The last line follows from and . Collecting terms, we get (10).
In general we cannot find the closed form solution of (10), so we have to resort to numerical methods.
Fortunately, most modern software packages have routines for finding the roots of a function of one vari-
able if the user supplies an initial search bracket. However, if , or in (9), then it can be shown
that (10) is a polynomial in c of degree 3 or 5. Then, we can use standard routines for solving polynomials.
In that case, estimation is fully specified. In the event of multiple positive roots, we simply evaluate (10)
and select the minimum. Alternatively, we could employ one Newton-Raphson step to update .
4 Algorithms for Parameterization of the Matrix
First let us derive the update formula for the unparameterized case since the general case yields a different
result than (4b).
4.1 General Case
The EM update for the general case (but with parameterized ) is given by the following proposition.
Proposition 4.1. The update formula for the general is
(14)
Proof. First note that for any and we have
12
7/31/2019 xu_wikle_070705
13/33
Let and rewrite (3) as a function of
Differentiating with respect to gives
Setting the above to zero and solving for yields the result. The second-derivative test confirms this is a
minimum.
Remark 4.1. When is also not parameterized, which implies is (4a) , then replacing with
in proposition (4.1) will yield (4b), or
4.2 Diagonal Case
Assume that is a diagonal matrix with diagonal elements in the vector :
(15)
Such a model is especially appropriate when the state variable is in the spectral domain, since the state
process elements are often approximately decorrelated in that setting (e.g., Wikle, 2002). The following
proposition gives the update equation for . It is simply the diagonal vector of for the general case as
given in (14).
Proposition 4.2. The update formula of for model (15) is
(16)
where is the diagonal of .
Proof. From (3) we have
Taking partial derivatives with respect to gives
13
7/31/2019 xu_wikle_070705
14/33
Setting the above to zero we obtain
which is true for . The second derivative test confirms that this is a minimum.
4.3 Conditional Autoregressive (CAR) Model
Consider a CAR model for in (1b) by assuming the following (e.g., He and Sun, 2000):
(17)
where if location and are neighbors, and otherwise. Define the adjacency matrix
. It can be shown that the covariance matrix of the joint distribution is (e.g.,
He and Sun, 2000). In other words, the model for is
(18)
Let be the eigenvalues of . Sun et al. (2000) showed that , and that in
order for the covariance matrix to be positive definite, .
Proposition 4.3. The M-step update of and for model (18) is
the root of (19a)
(19b)
where
(20)
and
(21)
Proof. Starting from (3) and using the fact that
14
7/31/2019 xu_wikle_070705
15/33
Taking the first derivative with respect to and , respectively, gives
(22a)
(22b)
Setting (22b) to zero yields
(23)
Substituting (23) into (22a) yields , or (20), the root of which is . Substituting in (23) we
obtain (19b). The second derivative test confirms that this is a minimum.
Note that to find one needs to perform a numerical search on a line. Fortunately, the initial search
bracket is always known, i.e., since has to be constrained as mentioned earlier. Therefore
these update formula are fully specified and their implementation is automatic.
4.4 Exponential Covariogram Model
It is common in spatial statistics to impose a parametric model on the spatial random field. We consider
the commonly used exponential covariogram model:
(24)
where the correlation matrix is governed by the exponential correlation function ,
and is the distance between two locations and is the spatial dependence parameter (Cressie, 1993). It
is important to recognize that there exists an analytical form for the first and second derivative of this
correlation function with respect to . This enables us to obtain the closed form update formula as
given by the following proposition.
Proposition 4.4. The update formula of and for the model (24) is
(25a)
(25b)
15
7/31/2019 xu_wikle_070705
16/33
where
and .
Proof. Starting from (3)
Taking the first derivative with respect to yields
Setting the above to zero and using for yields the update formula (25a). Notice this is an ECM
step. To update , we focus the following function :
Then, we use the GEM algorithm based on one Newton-Raphson Step to obtain (25b).
Remark 4.2. The algorithm given by Proposition 4.4 is appropriate for any covariogram model which
has analytical form of the first and second derivative of the correlation function with respect to the spatial
dependence parameter .
5 Parametrization for Transition Matrix
The transition (or propagator) matrix, , is the most critical part of the spatio-temporal dynamic model (1),
since it governs the evolution of the process. Each row of contains essentially the location-wise weights
applied to the process at the previous time for the spatial location corresponding to that row. It can shown
that the M-step update formula for the unparameterized is Equation (4a) regardless of parameterization
16
7/31/2019 xu_wikle_070705
17/33
of and . Here we propose a simple but powerful parameterization. Assume that an entry of is
either zero or , . The position of the s and the s in the matrix are fixed. We can write
(26)
where and is known.
In the next theorem we derive the closed form update formula for .
Proposition 5.1. The iteration update of for model (26) is
where
and
Proof. Starting from Equation (3),
Defining and differentiating with respect to gives
Setting the above to zero and using the fact that , we get
Therefore, we have linear equations. Writing in the matrix form and solving for gives the
result.
The formula in Proposition (5.1) implies that to update we need to know . This is impossible
with the standard EM algorithm. Instead we employ the ECM algorithm to update the parameters sequen-
tially (see Section 2.4). To illustrate, suppose is defined in the general case. Then, we can update and
together as suggested by the following remark.
Remark 5.1. The ECM update for model (26) and general is
1. first update with Proposition (5.1) by letting
2. then update with Proposition (4.1) by letting .
17
7/31/2019 xu_wikle_070705
18/33
6 Illustrative Examples
6.1 Advection-Diffusion PDE: A Simulation Study
6.1.1 Background
In an ecological study, researchers used a diffusion PDE to predict the spread of the house finch in the
eastern United States with a hierarchical Bayesian model (Wikle, 2003). In cases where one does not have
strong a priori belief that the diffusion parameter varies with space, such a model lends itself well to the
methodology we just developed. For illustration, consider a 1-dimensional advection-diffusion equation
for the spatio-temporal state process , at spatial location and time :
(27)
where is the advection coefficient and is the diffusion coefficient. Following basic finite difference
approaches to the numerical solution of partial differential equations (e.g., Haberman, 1987), we can apply
first-order forward differences in time ( ) and centered differences in space
( and ),
where these centered differences are valid for any time , to get
(28)
where and are the temporal and spatial increments, respectively, and we thus discretize the spatial
domain so that have equal spacing, . Note was added to (28) to make up for
the loss from discretization and to introduce extra stochastic forcing (Wikle, 2003). Furthermore, if we let
, (since spatial locations are equally
spaced, this is convenient notatially) and , then (28) becomes
(29)
Writing in matrix form, we have
(30)
18
7/31/2019 xu_wikle_070705
19/33
where is the interior process and is the boundary
process, respectively. Furthermore,
...... (31)
is the propagator matrix for the boundary process, and
......
.... . .
......
... (32)
is a tri-diagonal propagator matrix for the interior process. This matrix is in the form of (26) due to the
structural zeros.
For simplicity, we let the boundaries in (30) be zeros (i.e., for all . Thus, (30) be-
comes the usual state equation. In addition, we specify a measurement equation similar to (1a) with noise
covariance matrix (7). We have specified a spatio-temporal dynamic model for the diffusion PDE process:
(33a)
(33b)
where cov . We assume that the process error, , has an exponential covariance matrix
as described by (24), i.e., cov We assume both noise processes are Gaussian with
zero-mean.
Of course, in most real-world applications in which an advection-diffusion process would be ap-
propriate (e.g., processes in the atmosphere, ocean, or ecological processes) one would not know the
parameters and (and thus, , and ). Thus, we seek to estimate them. We demonstrate such
estimation by simulating the process and comparing estimates to the known parameters.
6.1.2 Simulating the Data Set
To illustrate our estimation methodology, we simulate a data set according to the above specified model.
Table 1 summarizes the actual value of the parameters and other simulation set-up values. As a way of
19
7/31/2019 xu_wikle_070705
20/33
Parameter Set-up
SNR Missing n T
.3 .6 .1 1 1 5 5 10 % 18 20 100
Table 1: Simulation setup for diffusion data used in Table 2 and Figure 1. ( Signal-to-Noise Ratio (SNR)
, = number of observation at time , spatial dimension of the state process, spatial
dependence parameter, parameters in the propagator matrix.)
gauging the estimation performance, we withhold a certain amount of data for validation. This missing
data set-up is achieved easily with an incidence matrix in the measurement equation.
Figure 1 (a) shows the map of the simulated spatio-temporal diffusion data. There is a noticeable
pattern of propagation of spatial features to the left through time. This is the result of the special structure
of the propagation matrix and the chosen value of . Another way of looking at the data is to
examine the time series plot for some locations (see Figure 2 ). We cannot easily detect spatio-temporal
propagation from such time series plots, but they give us information about the temporal structure and
stability of the signal.
6.1.3 ECM estimation
Estimation is carried out with an ECM algorithm. For this model we need to update the following pa-
rameters: . Given the current iterate , five CM-steps are used to obtain the
update :
CM-step 1: update by Proposition 5.1 with , and,
CM-step 2: update by Equation (25a) Proposition 4.4 with , ,
and ,
CM-step 3: update by Equation (25b) in Proposition 4.4 with , ,
and ,
CM-step 4: update by Proposition 3.1 with , and ,
CM-step 5: update by Equation (4d) with , and .
These parameters can be updated in a different order if desired. To update , we use the GEM Based on
One Newton-Raphson Step algorithm discussed in Section 2.4 (McLachlan and Krishnan, 1997) .
20
7/31/2019 xu_wikle_070705
21/33
(b) Est
Location
10 20
10
20
30
40
50
60
70
80
90
100
10 0 10
(c) Truth
Location
10 20
10
20
30
40
50
60
70
80
90
100
10 0 10
(d) TruthEst
Location
10 20
10
20
30
40
50
60
70
80
90
100
10 0 10
(e) Std
Location
10 20
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5
(a) Data
Location
Time
10 20
10
20
30
40
50
60
70
80
90
100
10 0 10
Figure 1: Simulated diffusion data and estimation: (a) Simulated data, ; see Table 1 for true parameter
values, (b) Smoothed value , (c) true process , (d) prediction error , (e) standard deviation of
prediction error .
21
7/31/2019 xu_wikle_070705
22/33
7/31/2019 xu_wikle_070705
23/33
1 13766.03 0.00 0.30 0.30 0.30 0.2000 1.00 1.00 1.00 6.66 0.00000
2 2825.46 10940.57 0.33 0.53 0.20 1.8420 5.51 1.14 1.00 22.73 28.69474
3 2643.62 181.83 0.33 0.59 0.14 2.6336 6.57 1.33 1.00 22.43 9.18374
4 2612.22 31.41 0.32 0.62 0.11 2.8971 6.81 1.54 1.00 22.21 4.57350
5 2597.93 14.28 0.31 0.63 0.09 2.9249 6.74 1.79 1.00 22.12 2.63997
61 2492.28 0.03 0.28 0.61 0.10 1.1671 4.78 9.13 1.00 26.67 0.03425
121 2491.71 0.00 0.28 0.61 0.10 0.9806 5.00 9.46 1.00 27.07 0.00839
181 2491.64 0.00 0.28 0.61 0.10 0.9196 5.08 9.56 1.00 27.19 0.00328
241 2491.62 0.00 0.28 0.60 0.10 0.8937 5.11 9.61 1.00 27.24 0.00148
265 2491.62 0.00 0.28 0.60 0.10 0.8878 5.12 9.62 1.00 27.25 0.00110
266 2491.62 0.00 0.28 0.60 0.10 0.8876 5.12 9.62 1.00 27.25 0.00108
267 2491.62 0.00 0.28 0.60 0.10 0.8874 5.12 9.62 1.00 27.25 0.00107
268 2491.62 0.00 0.28 0.60 0.10 0.8871 5.12 9.62 1.00 27.25 0.00106
269 2491.62 0.00 0.28 0.60 0.10 0.8869 5.12 9.62 1.00 27.25 0.00105
270 2491.62 0.00 0.28 0.60 0.10 0.8867 5.12 9.62 1.00 27.25 0.00103
271 2491.62 0.00 0.28 0.60 0.10 0.8865 5.12 9.62 1.00 27.25 0.00102
272 2491.62 0.00 0.28 0.60 0.10 0.8863 5.12 9.62 1.00 27.25 0.00101
Truth 0.30 0.60 0.10 1.0000 5.00 10.00
Table 2: ECM iterates for simulated diffusion data. is the
Newton-Raphson step size for updating . We consider the stopping criterion . See Figure 1 for
a plot of the data.
23
7/31/2019 xu_wikle_070705
24/33
estimates. As shown in Table 3, the estimates are generally centered around the true value with small
deviances. Several findings are worth noting. First, the estimates of the s are not sensitive to the amount
of missing data, yet the variance parameter estimates are more uncertain if the amount of missing data
is large. Second, and not surprisingly, large SNR values yield more accurate estimates for most of the
parameters except for the measurement noise variance .The findings from this simulation study do not necessarily generalize. However, they do give us a
picture of how this estimation procedure might perform in practice for a process that is very realistic in
many environmental applications.
6.2 Palmer Drought Severity Index (PDSI)
6.2.1 Background
Drought poses a serious problem for every society. One measure of drought is the Palmer Drought Severity
Index (PDSI), which is typically a monthly valued index (Heim, 2002). The typical value of PDSI ranges
from -6 to +6 with negative values denoting dry spells and positive values denoting wet spells.
We obtain the monthly PDSI for 107 locations in the central U.S. from January 1900 to December
1997. Figure 3 displays the data for two typical months. We can see that there is significant spatial
correlation in the data. Indeed, dry and wet spells occur with substantial spatial coherence across the
region. Therefore there is no need to model 107 stations individually. A more concise representation
should suffice for this data set. Thus, we consider a dimension reduced spatio-temporal approach to model
PDSI.
6.2.2 Dimension reduction
First, we introduce the idea of spatio-temporal dimension reduction. The key is to recast the state vector
in a much lower dimensional space by using a spectral basis (Wikle and Cressie, 1999). Let
(34)
where is an matrix of spectral basis function, , and contains the residual
process induced by the truncation. We treat as our new state vector, which follows a first-order Markov
process if follows such a process. Typically, is a non-dynamic (uncorrelated in time) spatial process.
Now, rewrite the spatio-temporal model (1) in the light of dimension reduction,
(35a)
24
7/31/2019 xu_wikle_070705
25/33
7/31/2019 xu_wikle_070705
26/33
105 100 95 9030
32
34
36
38
40
42
44
46
48
50
PDSI 7/1988
105 100 95 9030
32
34
36
38
40
42
44
46
48
50
PRED 7/1988
105 100 95 9030
32
34
36
38
40
42
44
46
48
50
PDSI 7/1993
105 100 95 9030
32
34
36
38
40
42
44
46
48
50
PRED 7/1993
Figure 3: PDSI data (left column) and prediction (right column) for two months. Dark (open) circle
corresponds to negative (positive) PDSI value. Size of the circle is proportional to the magnitude of the
PDSI value.
26
7/31/2019 xu_wikle_070705
27/33
(35b)
where contains measurement error as well as truncation error , and as is typical, we assume that
and are mean zero Gaussian processes that are temporally independent. This model is in the same form
as (1) with slightly different notation. We proceed to specify the covariance of after taking into account
the truncation error:
(36)
This is a simplified version of a formulation by Berliner et al. (2000). By using the next basis functions
, this formulation amounts to a second dimension reduction. If are orthonormal, then it is evident
that (36) is an example of model (9) with . We assume the covariance matrix of the model error
process is diagonal, i.e., . This is reasonable since the spectral decomposition typically leads
to decorrelation in spectral space.
If one is interested in predicting the process at locations for which one does not have data, then it
is important to consider explicitly as suggested by Wikle and Cressie (1999) and Cressie and Wikle
(2002). However, if one is primarily interested in the dynamic process and/or its parameters ( )
then it is simpler to consider through its marginal covariance ( in this case). This is directly anal-
ogous to traditional mixed models where if one is interested in inference on the fixed effects, then one
integrates out the random effects and considers the so-called marginal formulation. However, if one is
interested in predicting the random effects then one considers the random effects directly (the so-called
conditional specification). In this application, we are interested in forecasting the dynamic component so
it is reasonable to consider the effects of marginally as indicated above.
Although we could use any set of orthonormal basis functions (e.g., Fourier, wavelets, empirical),
we choose to use EOFs (empirical orthogonal functions) for this example, since they are widely used in
meteorological studies. EOFs are meteorologists name for the familiar principal components analysis
for spatio-temporal data (see Wikle, 1996 for an overview). We obtain the EOFs, , by performing
an eigenvalue decomposition of the estimated spatial covariance matrix of the data. Figure 4 shows the
percent variability accounted for by each basis function. Notice the steep decline up to the 10th EOF. The
remaining EOFs explain very little of the variability in the data. Therefore, we fix the truncation parameter
at 10. Instead of modeling a 107-dimensional state vector, we now model a 10-dimensional state vector,
which is a much easier task both statistically and computationally. Finally, we choose in (36) since
the next 20 EOFs account for about 10% of the variability and adding more EOFs does not add much
27
7/31/2019 xu_wikle_070705
28/33
1 10 30 50 70 90 1070
10
20
30
40
50
60
70
80
90
100
Percent
EOF ID
individual
cummulative
Figure 4: Percentages (solid line) and accumulated percentages (dashed line) of total variability accounted
for by EOFs for the PDSI data.
spatial structure.
6.2.3 EM Estimation
The parameters for model (35) are and . The M-step update formulas are Equation (4a), Propo-
sition (4.2), Equation (4d) and Proposition (3.2), respectively. The EM iteration history is shown in Table
4. The algorithm converges much faster than the ECM example discussed in Section 6.1.
To assess the fit of the model, we examine the plots of predicted values of the state vector in the original
space after making the transform, Figure 3 shows the one-month ahead prediction along
with the observed data for each of two months. As we can see, the predictions are able to capture the major
spatial pattern fairly well. Figure 6 shows time series plots for three stations for a 10-year period around
the major Midwest flood year of 1993. In general, the model does a reasonable job of predicting the next
months PDSI values even amidst the unusual the flood event. The prediction standard error map (Figure 5)
shows the prediction error is roughly of the same order across the spatial domain and for different time
periods.
28
7/31/2019 xu_wikle_070705
29/33
7/31/2019 xu_wikle_070705
30/33
1060 1080 1100 1120 1140 1160 1180
5
0
5
lon 102.30, lat 48.50
1060 1080 1100 1120 1140 1160 1180
5
0
5
lon 95.90, lat 39.60
1060 1080 1100 1120 1140 1160 1180
5
0
5
lon 92.30, lat 31.20
month
Figure 6: PDSI data (solid line) and prediction, (dashed line) for three stations.
7 Summary and Conclusion
We have proposed several parameterizations for a spatio-temporal dynamic model. The strategy is to make
use of valid spatial statistical models and to make simple physical assumptions about the process which
lead to partial restrictions on the transition matrix and/or the covariance matrices. We also derive the
relevant (General) EM update formulas for these restrictions. We demonstrate this methodology with a
simulation study in which the true state-process follows an advection-diffusion process. In addition, we
apply the methodology to the problem of spatio-temporal modeling of monthly Palmer Drought Severity
Index values over the central U.S.
It is important to point out that although this development was motivated by spatio-temporal problems,
the parameterization/GEM approach is quite flexible and the ideas contained in this paper can be used for
other parameterizations as well. In particular, the state-space framework is useful for many multivariate
time series problems (e.g., see applications in Shumway, 1988). However, the fact that we prefer closed
form update formulas does put a limit on our choices for parameterizations. In addition, since there could
be different parameterizations for the same data that are reasonable from a scientific perspective, it is
desirable to have a consistent way of performing model selection (e.g., Bengtsson 2000).
30
7/31/2019 xu_wikle_070705
31/33
It is reasonable to ask what are the advantages and disadvantages of the EM/GEM approach presented
here compared to a fully-Bayesian approach (e.g., Wikle 2003; Berliner et al. 2000). In many respects
the EM/GEM approach can be thought of as an empirical Bayesian approach, whereby the data are used
to estimate the lower level parameters in a hierarchical Bayesian model. In the spatio-temporal dynamical
model framework, the fully Bayesian approach is most useful when one has some prior understanding (ei-ther from scientific theory, or from previous empirical studies) about the process dynamics, particularly the
evolution operator (propagator). For example, Wikle (2003) considered a discretized PDE-based model
analogous to the advection-diffusion example in Section 6. However, in that case the ecological problem at
hand suggested that the dynamics were largely controlled by spatially-varying (yet unknown) diffusion co-
efficients that corresponded to population spread. In that context, it was appropriate, given the relationship
between heterogeneous population spread and habitat, that the diffusion parameters be spatially dependent
and should thus have a spatial prior distribution. However, for a process such as example 6.1, in which one
does not expect the advection and diffusion coefficients to be spatially-varying, then it is certainly reason-
able to assume no spatial dependence and thus estimate the relatively few parameters empiricallly through
the EM/GEM approach outlined here. In summary, when the model complexity increases, and/or when
one has significant prior knowledge about the dynamics, one should use a fully-Bayesian approach. When
one has a relatively simple model and little prior knowledge, then use the EM/GEM approach. However,
the EM/GEM approach is of limited utility if the process is high-dimensional. Thus, one must parameter-
ize the dynamics in this case to reduce the effective number of parameters. This is the approach we have
outlined here.
References
Bengtsson, T. (2000). Time series discrimination, signal comparison testing, and model selection in the
state-space framework. Ph.D. thesis, University of Missouri-Columbia.
Berliner, L. M., Wikle, C. K., and Cressie, N. (2000). Long-lead prediction of pacific SST via Bayesian
dynamic modeling. Journal of Climate, 13, 39533968.
Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance
functions. Journal of the American Statistical Association, 94(448), 13301340.
31
7/31/2019 xu_wikle_070705
32/33
Cressie, N. and Wikle, C. (2002). Space-time kalman filter. In A. El Shaarawi and W. Piegorsch, editors,
Encyclopedia of Environmetrics, volume 4, pages 20452049, New York. Wiley.
Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York, revised edition.
Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data. Journal of the
American Statistical Association, 97, 590600.
Gupta, N. and Mehra, R. (1974). Computational aspects of maximum likelihood estimation and reduction
in sensitivity function calculations. IEEE Transactions on Automatic Control, 19, 774783.
Haberman, R. (1987). Elementary Applied Partial Differential Equations, Second Edition. Prentice-Hall,
Inc., New Jersey.
Harville, D. A. (1997). Matrix Algebra from a Statisticians Perspective. Springer, New York.
He, Z. and Sun, D. (2000). Hierarchical bayes estimation of hunting success rates with spatial correlations.
Biometrics, 56, 360367.
Heim Jr., R. R. (2002). A review of twentieth-century drought indices used in the united states. Bulletin
of American Meteorological Society, 83, 11491165.
Huerta, G., Sanso, B., and Stroud, J. (2004). A spatiotemporal model for mexico city ozone levels. Journal
of the Royal Statistical Society, Series C, 53, 231248.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic
Engineering, 82(D), 3545.
Kyriakidis, P. C. and Journel, A. G. (1999). Geostatistical space-time models: a review. Mathematical
Geology, 31(6), 651684.
Lange, K. (1999). Numerical Analysis for Statisticians. Springer, New York.
Mardia, K., Goodall, C., Redfern, E., and Alonso, F. (1998). The kriged kalman filter (with discussion).
Test, 7, 217285.
McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.
Shumway, R. (1988). Applied Statistical Time Series Analysis. Prentice Hall, Englewood Cliffs, NJ.
32
7/31/2019 xu_wikle_070705
33/33
Shumway, R. H. and Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using
the EM algorithm. Journal of Time Series Analysis, 3(4), 253264.
Shumway, R. H. and Stoffer, D. S. (2000). Time Series Analysis and Its Applications. Springer, New York.
Stroud, J., Mueller, P., and Sanso, B. (2001). Dynamic models for spatio-temporal data. Journal of the
Royal Statistical Society, Series B, 63, 673689.
Sun, D., Tsutakawa, R. K., and Speckman, P. L. (2000). Bayesian inference for CAR (1) models with
noninformative priors. Biometrika, 86, 341350.
Tanner, M. A. (1996). Tools for statistical inference : methods for the exploration of posterior distributions
and likelihood functions. Springer, New York.
Wikle, C. K. (1996). Spatio-temporal statistical models with applications to atmospheric processes. Ph.D.
thesis, Iowa State University.
Wikle, C. K. (2002). Spatial modeling of count data: A case study in modelling breeding bird survey data
on large spatial domains. In A. B. Lawson and D. G. T. Denison, editors, Spatial Cluster Modelling,
pages 199209. Chapman and Hall.
Wikle, C. K. (2003). Hierarchical Bayesian models for predicting the spread of ecological processes.
Ecology, 84, 13821394.
Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach to space-time kalman filtering.
Biometrika, 86(4), 815829.
Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. Journal
of Environmental and Ecological Statistics, 5, 117154.
Wikle, C. K., Milliff, R., Nychka, D., and Berliner, L. (2001). Spatiotemporal hierarchical Bayesian
modeling: Tropical ocean surface winds. Journal of the American Statistical Association, 96, 382397.
33