xu_wikle_070705

  • Upload
    lu-san

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

  • 7/31/2019 xu_wikle_070705

    1/33

    Estimation of Parameterized Spatio-Temporal Dynamic Models

    Ke Xu and Christopher K. Wikle

    Department of Statistics, University of Missouri-Columbia

    July 7, 2005: Original Submission July 21, 2004

    Abstract

    Spatio-temporal processes are often high-dimensional, exhibiting complicated variability across

    space and time. Traditional state-space model approaches to such processes in the presence of uncer-

    tain data have been shown to be useful. However, estimation of state-space models in this context is of-

    ten problematic since parameter vectors and matrices are of high dimension and can have complicated

    dependence structures. We propose a spatio-temporal dynamic model formulation with parameter

    matrices restricted based on prior scientific knowledge and/or common spatial models. Estimation is

    carried out via the expectation-maximization (EM) algorithm or general EM algorithm. Several param-

    eterization strategies are proposed and analytical or computational closed form EM update equations

    are derived for each. We apply the methodology to a model based on an advection-diffusion partial

    differential equation in a simulation study and also to a dimension-reduced model for a Palmer Drought

    Severity Index (PDSI) data set.

    Keywords: Dynamic, EM algorithm, General EM, state-space, time series, spatial, spatio-temporal

    Corresponding Author: Christopher K. Wikle, Department of Statistics, University of Missouri, 146 Math Science Building,

    Columbia, MO 65211; 573-882-9659; fax: 573-884-5524; [email protected]

    1

  • 7/31/2019 xu_wikle_070705

    2/33

    1 Introduction

    Spatio-temporal statistical models are essential tools for performing inference and prediction for processes

    in the physical, environmental, and biological sciences. Such processes are often complicated in that the

    dependence structure across space and time is non-trivial, often non-separable and non-stationary in space

    or time. In addition, it is often the case that the number of spatial locations at which inference is desired

    is quite large. Furthermore, data are often collected with substantial observational uncertainty and it is not

    uncommon to have missing observations at various spatial and temporal locations.

    Various approaches have been proposed to model spatio-temporal processes (e.g., see Kyriakidis and

    Journel, 1999 for a review). If one considers time as an extra dimension, then traditional spatial statistics

    techniques can be applied (Cressie, 1993). However, such approaches ignore the fundamental differences

    between space and time, principally that time is naturally ordered and space is not. Alternatively, one can

    consider the spatio-temporal problem from a multivariate geostatistical perspective which requires space-time covariance functions be specified. Traditionally this approach has been limited in that the known

    class of valid spatio-temporal covariance functions is quite small, although in recent years, several authors

    have extended this class of functions (e.g., Cressie and Huang, 1999; Gneiting, 2002). Nevertheless, this

    approach is still limited by the fact that such covariance functions are often not realistic for complicated

    dynamical processes and dimensionality can prohibit practical implementation.

    Spatio-temporal processes can also be considered from the multiple time series perspective (e.g., see

    Kyriakidis and Journel, 1999 for a review). That is, each spatial location is associated with a time series.

    Then, multivariate time series techniques can be transferred to the space-time problem. However, such

    approaches ignore the fundamental differences between space and time and ones ability to predict at

    locations for which data were not observed is limited. Such approaches do not in general explicitly account

    for uncertainty in the observed data. Perhaps more critically, such methods are difficult to implement in

    cases where the dimensionality of the state vector (i.e., the number of spatial locations) is high.

    A natural approach to spatio-temporal modeling for complex dynamical processes is a combination

    of spatial and time series techniques, which is accomplished by a spatio-temporal dynamic model formu-

    lation (e.g., see Cressie and Wikle, 2002 for a brief review). However, estimation in this context can beproblematic due to the high dimensionality of the state process. Several modeling strategies have been pro-

    posed to address this problem. One approach is to reduce dimensionality by projecting the state-process on

    some set of spectral basis functions (e.g., Mardia, Goodall, Redfern and Alonso, 1998; Wikle and Cressie,

    2

  • 7/31/2019 xu_wikle_070705

    3/33

    1999). Alternatively, one might specify very simple, random walk dynamics (e.g., Stroud, Mueller and

    Sanso, 2001; Huerta, Sanso and Stroud, 2004). Another approach is to incorporate physical or biological

    models directly into the parametrization (Wikle, Milliff, Nychka, and Berliner, 2001; Wikle, 2003). Even

    in the case of physically or biologically motivated dynamic models, it is seldom the case for statistical

    problems (unlike some engineering problems) that we know explicitly the model parameters. These mustbe estimated, but in the presence of known constraints on the dynamical formulation. Furthermore, due

    to the high dimensionality, measurement error and process covariance matrices typically have too many

    parameters to estimate outright. Thus, these matrices must be parameterized as well.

    Estimation in the spatio-temporal dynamical model setting is best accomplished through a state-space

    framework. Given parameters, the unobserved state-process can be estimated via the Kalman filter or

    Kalman smoother (e.g., see Cressie and Wikle, 2002 for a review). However, in the more usual setting

    where model parameters are unknown, the standard approach following Shumway and Stoffer (1982) is to

    use the expectation-maximization (EM) algorithm to estimate parameters. As mentioned above, the spatio-

    temporal problem typically requires restrictions on the parameter matrices. Shumway and Stoffer (1982)

    discuss modifications to their algorithm to accommodate fully-restricted parameter matrices. However, it

    is not clear how they account for partially restricted or parameterized model matrices in this framework.

    Examination of Shumway (1988) (pp. 323-332) implies that one approach to deal with partially restricted

    parameter matrices is to set initial parameters (in the EM algorithm) to agree with the known values. Then

    in the M-step, only those parameters that require estimation are updated, so that the fixed parameters

    dont change. Alternatively, one can update all parameters but then immediately impute the known values

    for the fixed parameters. Although these approaches are relatively easy to implement, it is not clear that

    they give the maximum likelihood estimates under the state-space model assumptions. Another approach,

    considered here, is to develop general EM (GEM) algorithms to account directly for the restricted or

    partially restricted model matrices.

    In this paper we describe efficient estimation approaches for spatio-temporal dynamic models in which

    the parameter matrices and/or noise covariance matrices are highly parameterized (or restricted). We

    utilize GEM algorithms to carry out this estimation. In Section 2, we give some necessary background for

    spatio-temporal dynamic models and GEM algorithms. In Section 3, 4 and 5, we propose several methods

    of parameterization and derive the Expectation-Maximization (EM) update formula for each. In Section 6,

    we consider two examples. Finally, Section 7 contains a brief summary and conclusion.

    3

  • 7/31/2019 xu_wikle_070705

    4/33

    2 Background

    2.1 Spatio-Temporal Dynamic Model Formulation

    Let be an vector containing the data values at spatial locations,

    , at time . Let be an vector for an unobservable spatio-temporal state

    process at some fixed network of locations at time . This state process is our primary interest.

    The two sets of spatial locations, , where is some domain in , need not be the same. Write

    (1a)

    (1b)

    for , where (1a) is called the measurement equation and (1b) the state equation. Let be a

    known matrix that maps the data to the process . The measurement noise is zero-mean,

    uncorrelated in time and Gaussian with covariance matrix . The dynamics are described by the

    state equation (1b) via a first-order Markov process with the transition or propagator matrix . We also

    assume there are shocks to the system, which are spatially colored, temporally white and Gaussian with

    mean zero and a common covariance matrix . For completeness, we assume the process starts

    with , which is a Gaussian spatial process with mean and covariance matrix . Such a model

    in the spatio-temporal context is not new, nor is it the most general. However, such models have received

    a considerable amount of attention in the environmental literature in recent years and have been shown to

    be quite effective (e.g. see the review in Cressie and Wikle, 2002).

    The parameters for the model (1) are The major challenge in fitting this

    model lies in the high dimensionality of most space-time applications. For example, a model for Pacific

    Sea Surface Temperature (SST) might have (Berliner, Wikle and Cressie, 2000), which

    requires estimation of a matrix . Often, researchers resort to Bayesian hierarchical ap-

    proaches for dealing with this dimensionality problem by considering restrictions to , assigning priors

    and then using MCMC (Wikle, Berliner and Cressie, 1998). This paper shows that it is often possible to fit

    such models and estimate their parameters via the convenient Kalman and EM algorithms. Again, the key

    is to assign a structure to the model parameters. Though still limited for many problems, such a formula-

    tion is useful in many settings. For example, in the early stage of the model building, one might consider

    such an implementation, since it is fast and easy to implement when compared to the MCMC. In addition,

    there are situations where there is little scientific theory or previous empirical evidence to suggest prior

    4

  • 7/31/2019 xu_wikle_070705

    5/33

    parameterizations for a fully-Bayesian model. In these cases, if the model is sufficiently parameterizaed,

    the KF/EM approach is a reasonable alternative.

    2.2 Kalman Filter and Smoother

    Suppose we know the value of all parameters, , then one can use a set of recursions known as the

    Kalman Filter and Kalman Smoother to obtain the conditional mean and covariance of the state variable,

    (Kalman, 1960; Shumway and Stoffer, 1982). These recursions are well-known but we present them

    here to define notation and for completeness. Our overview follows Shumway and Stoffer (2000) with

    various notational modifications. First, define the conditional mean In particular,

    and are called the predicted, filtered and smoothed values, respectively. Also define the

    conditional variance covariance matrix, var and lag-one covariance matrix,

    cov

    To get predicted and filtered values, one evaluates the following set of recursions for ,

    which is called the Kalman Filter:

    and where and are specified. To get smoothed values, one runs the following backward recursion

    for , which is sometimes called the Kalman Smoother:

    To get the smoothed lag-one covariance, one runs the backward recursion for on

    where

    5

  • 7/31/2019 xu_wikle_070705

    6/33

    2.3 EM Estimation

    One can estimate the parameters by the method of moments and then plug them in (1) to implement the

    Kalman filter (Wikle and Cressie, 1999). Alternatively, one can run the Kalman recursion and recognize

    that a byproduct of the Kalman algorithm is that the likelihood can be computed from the filtered values

    with little extra effort. That is, if we define an innovation and its covariance as and

    respectively. Then, the log likelihood value up to a constant is simply (Shumway

    and Stoffer, 2000):

    (2)

    Thus, we might perform maximum likelihood estimation, either numerically (Gupta and Mehra, 1974) or

    by the EM algorithm (Shumway and Stoffer, 1982, 2000). In this paper we focus on the EM algorithm.

    Consider as the complete data and denote its likelihood . An EM

    iteration consists of two steps: an E-step and an M-step. Given the current value of the parameters, ,

    the E-step computes the expected value of the complete data likelihood, which is of the following form

    (for details see Shumway and Stoffer, 2000):

    (3)

    where and

    Note and depend on .

    In the M-step, an update, is chosen such that . This will

    guarantee that the likelihood increases monotonically. When the likelihood function is bounded, the iter-

    ates will eventually converge to the MLE. If is also the minimum of (3), we have the standard EM

    algorithm. Otherwise, the algorithm is known as General EM (GEM) (McLachlan and Krishnan, 1997).

    In the case of there exists a closed form EM update formula for all parameters. Minimizing

    (3) with respect to the parameters yields the M-step update formula for our model (Shumway and Stoffer,

    1982).

    6

  • 7/31/2019 xu_wikle_070705

    7/33

    (4a)

    (4b)

    (4c)

    (4d)

    where

    (5)

    Note that is not updated, since and are essentially nuisance parameters and they cannot be

    estimated simultaneously (Shumway and Stoffer, 1982). We choose to update rather than the covariance

    matrix since in general we do not have enough data to justify estimating a covariance matrix (we have

    only one observation for the initial vector.)

    As mentioned previously, for our applications, the spatially indexed data vector, , is usually high

    dimensional. As a result, our parameters are often of high dimension as well. Hence, some form of

    dimension reduction is called for. One approach is to parameterize by exploiting the special structure of

    the process. We propose several approaches for specifying realistic submodels for and , thereby

    substantially easing the burden of estimation.

    Our methods rely heavily on GEM algorithms, since we shall see later that in many cases parameters

    are not separable, which means the joint best update , such as (4), is not available. It is also

    the case that sometimes the analytical closed form best update formula cannot be derived for some of

    the parameters. As a result, we often must settle for a better update, which only need ensure that the

    likelihood move monotonically. The price to be paid for this generality is that it takes more iterations

    to converge than what we would experience with the traditional types of EM estimation for state-space

    models. Two of the most useful GEM algorithms are described in the following section.

    Although we do not explicitly describe algorithms for obtaining standard error estimates for , they

    can be computed in various ways. For example, it is sometimes possible to evaluate the Hessian matrix

    after convergence (Shumway and Stoffer, 2000). Alternatively, one may obtain estimates of the standard

    error by perturbing the likelihood function (2) and using numerical differentiation (e.g., Shumway and

    Stoffer, 2000; Tanner, 1996).

    7

  • 7/31/2019 xu_wikle_070705

    8/33

    2.4 Two General EM (GEM) Algorithms

    2.4.1 Expectation-conditional maximization (ECM) algorithm

    An ECM algorithm consists of an expectation (E) step and conditional maximization (CM) steps (McLach-

    lan and Krishnan, 1997). Sometimes the M-step update is difficult to obtain, so we replace the M-step

    with several simple CM-steps. As an example, suppose the parameter of interest consists of two parts, i.e.,

    . An ECM algorithm updates the two sub-parameters sequentially or conditionally. That

    is, given the current value , we obtain the update via two CM-steps subject to the conditionally

    maximizing requirement at each CM-step:

    CM-step 1: update with such that ,

    CM-step 2: update with such that .

    Note . That is, the final update satisfies

    , so the likelihood value increases after the final update. Clearly ECM qualifies as a

    GEM algorithm. If there is need to further divide the parameters into additional parts, the update simply

    takes more CM-steps.

    2.4.2 GEM Based on One Newton-Raphson Step

    Next, consider , where is a scalar parameter. This is for notational ease and illustration, as

    the algorithm described below also works for vector parameters. If the first two derivatives of with

    respect to exist in closed form, we can use a procedure called GEM Based on One Newton-Raphson

    Step to update (McLachlan and Krishnan, 1997). The update has the form

    where and

    For sufficiently small, this will guarantee that , so this procedure is a

    GEM algorithm. In practice choosing will suffice when near the the minimum (Lange, 1999).

    Since this step satisfies the conditionally maximizing requirement, it works well with ECM.

    8

  • 7/31/2019 xu_wikle_070705

    9/33

    2.5 Convergence Criteria

    The EM algorithm is said to converge when one of the two following conditions is met (Tanner, 1996):

    or

    for some small positive , where and are (scalar) elements of . Since the EM

    algorithm converges to a stationary point, which can be a saddle point, local minimum, or global minimum

    (McLachlan and Krishnan, 1997), it is advisable to check the result with several different starting values.

    2.6 Starting Values

    To achieve fast convergence, one should choose reasonable starting values. One simple method is to use

    moment based estimates. Suppose the data vector are of the same size for all and . It is straight

    forward to calculate the sample estimates of its first two moments:

    (6a)

    (6b)

    (6c)

    where denotes lag-one covariance estimate.

    As a crude guess, assume follows a VAR(1) model. We then use the obtained estimates as starting

    values: and . We typically specify the measurement

    noise covariance matrix byR

    , whereR

    is obtained from an assessment of the measuring

    instrument or from the estimate of nugget effect from a spatial variogram (e.g., Cressie 1993).

    Note that in cases with , the estimates and are not positive definite and one cannot

    get an estimate of and as described above. Alternatively, one can fit individual univariate

    autoregressive models of order 1 for each spatial location and let and be diagonal matrices with

    estimates of the autoregressive parameters and conditional variances on the diagonal, respectively.

    9

  • 7/31/2019 xu_wikle_070705

    10/33

    3 Algorithms for Parameterizations of the Matrix

    3.1 White Noise

    Without site-specific information about the measurement error process, it is often realistic to assume that

    measurement error is independent and identically distributed white noise for all data locations, especially

    if the domain of interest is relatively homogeneous. For example, researchers have modeled monthly

    temperature in the U.S. corn belt with an iid measurement error for all sites (Wikle et al., 1998). In this

    simple case, we reduce the error matrix to a product of a scalar and the identity matrix:

    (7)

    One can show that a closed form M-step update formula exists for .

    Proposition 3.1. The M-step update of for model (7) is

    (8)

    Proof. Rewrite Equation (3) as

    Differentiating with respect to

    Setting the above to zero and solving for gives the result. A second-derivative test shows that this is

    indeed the minimum.

    3.2 Truncated Basis Function Representation

    In some cases, the measurement error does depend on spatial location, so assuming iid error is no longer

    appropriate. However, if we have some knowledge about the measurement error, say from historical data

    or from a reformulation of the measurement equation, then we can incorporate that information into the

    model. First, we assume the measurement error covariance is not time dependent, . Now, consider

    a basis function expansion for the matrix (Berliner et al., 2000; Harville, 1997):

    10

  • 7/31/2019 xu_wikle_070705

    11/33

    (9)

    where are symmetric and idempotent matrices such that and . We

    assume that and are all known and the positive scalar, c is the only unknown.

    This model makes use of an incomplete matrix decomposition such as an eigenvalue decomposition.

    In this case we know the dominant matrix bases for and use the term to represent smaller scale

    variability and randomness, to ensure that is positive definite. Estimation of with the EM algorithm

    in this case involves a numerical step as suggested by the following theorem.

    Proposition 3.2. The M-step update of c for model (9) is

    the (positive) root of

    where

    (10)

    and is given by (5).

    Proof. First note that is positive definite for any and (Harville, 1997)

    where , and From (3) we have

    (11)

    11

  • 7/31/2019 xu_wikle_070705

    12/33

    Taking the first derivative with respect to gives

    (12)

    (13)

    The last line follows from and . Collecting terms, we get (10).

    In general we cannot find the closed form solution of (10), so we have to resort to numerical methods.

    Fortunately, most modern software packages have routines for finding the roots of a function of one vari-

    able if the user supplies an initial search bracket. However, if , or in (9), then it can be shown

    that (10) is a polynomial in c of degree 3 or 5. Then, we can use standard routines for solving polynomials.

    In that case, estimation is fully specified. In the event of multiple positive roots, we simply evaluate (10)

    and select the minimum. Alternatively, we could employ one Newton-Raphson step to update .

    4 Algorithms for Parameterization of the Matrix

    First let us derive the update formula for the unparameterized case since the general case yields a different

    result than (4b).

    4.1 General Case

    The EM update for the general case (but with parameterized ) is given by the following proposition.

    Proposition 4.1. The update formula for the general is

    (14)

    Proof. First note that for any and we have

    12

  • 7/31/2019 xu_wikle_070705

    13/33

    Let and rewrite (3) as a function of

    Differentiating with respect to gives

    Setting the above to zero and solving for yields the result. The second-derivative test confirms this is a

    minimum.

    Remark 4.1. When is also not parameterized, which implies is (4a) , then replacing with

    in proposition (4.1) will yield (4b), or

    4.2 Diagonal Case

    Assume that is a diagonal matrix with diagonal elements in the vector :

    (15)

    Such a model is especially appropriate when the state variable is in the spectral domain, since the state

    process elements are often approximately decorrelated in that setting (e.g., Wikle, 2002). The following

    proposition gives the update equation for . It is simply the diagonal vector of for the general case as

    given in (14).

    Proposition 4.2. The update formula of for model (15) is

    (16)

    where is the diagonal of .

    Proof. From (3) we have

    Taking partial derivatives with respect to gives

    13

  • 7/31/2019 xu_wikle_070705

    14/33

    Setting the above to zero we obtain

    which is true for . The second derivative test confirms that this is a minimum.

    4.3 Conditional Autoregressive (CAR) Model

    Consider a CAR model for in (1b) by assuming the following (e.g., He and Sun, 2000):

    (17)

    where if location and are neighbors, and otherwise. Define the adjacency matrix

    . It can be shown that the covariance matrix of the joint distribution is (e.g.,

    He and Sun, 2000). In other words, the model for is

    (18)

    Let be the eigenvalues of . Sun et al. (2000) showed that , and that in

    order for the covariance matrix to be positive definite, .

    Proposition 4.3. The M-step update of and for model (18) is

    the root of (19a)

    (19b)

    where

    (20)

    and

    (21)

    Proof. Starting from (3) and using the fact that

    14

  • 7/31/2019 xu_wikle_070705

    15/33

    Taking the first derivative with respect to and , respectively, gives

    (22a)

    (22b)

    Setting (22b) to zero yields

    (23)

    Substituting (23) into (22a) yields , or (20), the root of which is . Substituting in (23) we

    obtain (19b). The second derivative test confirms that this is a minimum.

    Note that to find one needs to perform a numerical search on a line. Fortunately, the initial search

    bracket is always known, i.e., since has to be constrained as mentioned earlier. Therefore

    these update formula are fully specified and their implementation is automatic.

    4.4 Exponential Covariogram Model

    It is common in spatial statistics to impose a parametric model on the spatial random field. We consider

    the commonly used exponential covariogram model:

    (24)

    where the correlation matrix is governed by the exponential correlation function ,

    and is the distance between two locations and is the spatial dependence parameter (Cressie, 1993). It

    is important to recognize that there exists an analytical form for the first and second derivative of this

    correlation function with respect to . This enables us to obtain the closed form update formula as

    given by the following proposition.

    Proposition 4.4. The update formula of and for the model (24) is

    (25a)

    (25b)

    15

  • 7/31/2019 xu_wikle_070705

    16/33

    where

    and .

    Proof. Starting from (3)

    Taking the first derivative with respect to yields

    Setting the above to zero and using for yields the update formula (25a). Notice this is an ECM

    step. To update , we focus the following function :

    Then, we use the GEM algorithm based on one Newton-Raphson Step to obtain (25b).

    Remark 4.2. The algorithm given by Proposition 4.4 is appropriate for any covariogram model which

    has analytical form of the first and second derivative of the correlation function with respect to the spatial

    dependence parameter .

    5 Parametrization for Transition Matrix

    The transition (or propagator) matrix, , is the most critical part of the spatio-temporal dynamic model (1),

    since it governs the evolution of the process. Each row of contains essentially the location-wise weights

    applied to the process at the previous time for the spatial location corresponding to that row. It can shown

    that the M-step update formula for the unparameterized is Equation (4a) regardless of parameterization

    16

  • 7/31/2019 xu_wikle_070705

    17/33

    of and . Here we propose a simple but powerful parameterization. Assume that an entry of is

    either zero or , . The position of the s and the s in the matrix are fixed. We can write

    (26)

    where and is known.

    In the next theorem we derive the closed form update formula for .

    Proposition 5.1. The iteration update of for model (26) is

    where

    and

    Proof. Starting from Equation (3),

    Defining and differentiating with respect to gives

    Setting the above to zero and using the fact that , we get

    Therefore, we have linear equations. Writing in the matrix form and solving for gives the

    result.

    The formula in Proposition (5.1) implies that to update we need to know . This is impossible

    with the standard EM algorithm. Instead we employ the ECM algorithm to update the parameters sequen-

    tially (see Section 2.4). To illustrate, suppose is defined in the general case. Then, we can update and

    together as suggested by the following remark.

    Remark 5.1. The ECM update for model (26) and general is

    1. first update with Proposition (5.1) by letting

    2. then update with Proposition (4.1) by letting .

    17

  • 7/31/2019 xu_wikle_070705

    18/33

    6 Illustrative Examples

    6.1 Advection-Diffusion PDE: A Simulation Study

    6.1.1 Background

    In an ecological study, researchers used a diffusion PDE to predict the spread of the house finch in the

    eastern United States with a hierarchical Bayesian model (Wikle, 2003). In cases where one does not have

    strong a priori belief that the diffusion parameter varies with space, such a model lends itself well to the

    methodology we just developed. For illustration, consider a 1-dimensional advection-diffusion equation

    for the spatio-temporal state process , at spatial location and time :

    (27)

    where is the advection coefficient and is the diffusion coefficient. Following basic finite difference

    approaches to the numerical solution of partial differential equations (e.g., Haberman, 1987), we can apply

    first-order forward differences in time ( ) and centered differences in space

    ( and ),

    where these centered differences are valid for any time , to get

    (28)

    where and are the temporal and spatial increments, respectively, and we thus discretize the spatial

    domain so that have equal spacing, . Note was added to (28) to make up for

    the loss from discretization and to introduce extra stochastic forcing (Wikle, 2003). Furthermore, if we let

    , (since spatial locations are equally

    spaced, this is convenient notatially) and , then (28) becomes

    (29)

    Writing in matrix form, we have

    (30)

    18

  • 7/31/2019 xu_wikle_070705

    19/33

    where is the interior process and is the boundary

    process, respectively. Furthermore,

    ...... (31)

    is the propagator matrix for the boundary process, and

    ......

    .... . .

    ......

    ... (32)

    is a tri-diagonal propagator matrix for the interior process. This matrix is in the form of (26) due to the

    structural zeros.

    For simplicity, we let the boundaries in (30) be zeros (i.e., for all . Thus, (30) be-

    comes the usual state equation. In addition, we specify a measurement equation similar to (1a) with noise

    covariance matrix (7). We have specified a spatio-temporal dynamic model for the diffusion PDE process:

    (33a)

    (33b)

    where cov . We assume that the process error, , has an exponential covariance matrix

    as described by (24), i.e., cov We assume both noise processes are Gaussian with

    zero-mean.

    Of course, in most real-world applications in which an advection-diffusion process would be ap-

    propriate (e.g., processes in the atmosphere, ocean, or ecological processes) one would not know the

    parameters and (and thus, , and ). Thus, we seek to estimate them. We demonstrate such

    estimation by simulating the process and comparing estimates to the known parameters.

    6.1.2 Simulating the Data Set

    To illustrate our estimation methodology, we simulate a data set according to the above specified model.

    Table 1 summarizes the actual value of the parameters and other simulation set-up values. As a way of

    19

  • 7/31/2019 xu_wikle_070705

    20/33

    Parameter Set-up

    SNR Missing n T

    .3 .6 .1 1 1 5 5 10 % 18 20 100

    Table 1: Simulation setup for diffusion data used in Table 2 and Figure 1. ( Signal-to-Noise Ratio (SNR)

    , = number of observation at time , spatial dimension of the state process, spatial

    dependence parameter, parameters in the propagator matrix.)

    gauging the estimation performance, we withhold a certain amount of data for validation. This missing

    data set-up is achieved easily with an incidence matrix in the measurement equation.

    Figure 1 (a) shows the map of the simulated spatio-temporal diffusion data. There is a noticeable

    pattern of propagation of spatial features to the left through time. This is the result of the special structure

    of the propagation matrix and the chosen value of . Another way of looking at the data is to

    examine the time series plot for some locations (see Figure 2 ). We cannot easily detect spatio-temporal

    propagation from such time series plots, but they give us information about the temporal structure and

    stability of the signal.

    6.1.3 ECM estimation

    Estimation is carried out with an ECM algorithm. For this model we need to update the following pa-

    rameters: . Given the current iterate , five CM-steps are used to obtain the

    update :

    CM-step 1: update by Proposition 5.1 with , and,

    CM-step 2: update by Equation (25a) Proposition 4.4 with , ,

    and ,

    CM-step 3: update by Equation (25b) in Proposition 4.4 with , ,

    and ,

    CM-step 4: update by Proposition 3.1 with , and ,

    CM-step 5: update by Equation (4d) with , and .

    These parameters can be updated in a different order if desired. To update , we use the GEM Based on

    One Newton-Raphson Step algorithm discussed in Section 2.4 (McLachlan and Krishnan, 1997) .

    20

  • 7/31/2019 xu_wikle_070705

    21/33

    (b) Est

    Location

    10 20

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    10 0 10

    (c) Truth

    Location

    10 20

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    10 0 10

    (d) TruthEst

    Location

    10 20

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    10 0 10

    (e) Std

    Location

    10 20

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    1 2 3 4 5

    (a) Data

    Location

    Time

    10 20

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    10 0 10

    Figure 1: Simulated diffusion data and estimation: (a) Simulated data, ; see Table 1 for true parameter

    values, (b) Smoothed value , (c) true process , (d) prediction error , (e) standard deviation of

    prediction error .

    21

  • 7/31/2019 xu_wikle_070705

    22/33

  • 7/31/2019 xu_wikle_070705

    23/33

    1 13766.03 0.00 0.30 0.30 0.30 0.2000 1.00 1.00 1.00 6.66 0.00000

    2 2825.46 10940.57 0.33 0.53 0.20 1.8420 5.51 1.14 1.00 22.73 28.69474

    3 2643.62 181.83 0.33 0.59 0.14 2.6336 6.57 1.33 1.00 22.43 9.18374

    4 2612.22 31.41 0.32 0.62 0.11 2.8971 6.81 1.54 1.00 22.21 4.57350

    5 2597.93 14.28 0.31 0.63 0.09 2.9249 6.74 1.79 1.00 22.12 2.63997

    61 2492.28 0.03 0.28 0.61 0.10 1.1671 4.78 9.13 1.00 26.67 0.03425

    121 2491.71 0.00 0.28 0.61 0.10 0.9806 5.00 9.46 1.00 27.07 0.00839

    181 2491.64 0.00 0.28 0.61 0.10 0.9196 5.08 9.56 1.00 27.19 0.00328

    241 2491.62 0.00 0.28 0.60 0.10 0.8937 5.11 9.61 1.00 27.24 0.00148

    265 2491.62 0.00 0.28 0.60 0.10 0.8878 5.12 9.62 1.00 27.25 0.00110

    266 2491.62 0.00 0.28 0.60 0.10 0.8876 5.12 9.62 1.00 27.25 0.00108

    267 2491.62 0.00 0.28 0.60 0.10 0.8874 5.12 9.62 1.00 27.25 0.00107

    268 2491.62 0.00 0.28 0.60 0.10 0.8871 5.12 9.62 1.00 27.25 0.00106

    269 2491.62 0.00 0.28 0.60 0.10 0.8869 5.12 9.62 1.00 27.25 0.00105

    270 2491.62 0.00 0.28 0.60 0.10 0.8867 5.12 9.62 1.00 27.25 0.00103

    271 2491.62 0.00 0.28 0.60 0.10 0.8865 5.12 9.62 1.00 27.25 0.00102

    272 2491.62 0.00 0.28 0.60 0.10 0.8863 5.12 9.62 1.00 27.25 0.00101

    Truth 0.30 0.60 0.10 1.0000 5.00 10.00

    Table 2: ECM iterates for simulated diffusion data. is the

    Newton-Raphson step size for updating . We consider the stopping criterion . See Figure 1 for

    a plot of the data.

    23

  • 7/31/2019 xu_wikle_070705

    24/33

    estimates. As shown in Table 3, the estimates are generally centered around the true value with small

    deviances. Several findings are worth noting. First, the estimates of the s are not sensitive to the amount

    of missing data, yet the variance parameter estimates are more uncertain if the amount of missing data

    is large. Second, and not surprisingly, large SNR values yield more accurate estimates for most of the

    parameters except for the measurement noise variance .The findings from this simulation study do not necessarily generalize. However, they do give us a

    picture of how this estimation procedure might perform in practice for a process that is very realistic in

    many environmental applications.

    6.2 Palmer Drought Severity Index (PDSI)

    6.2.1 Background

    Drought poses a serious problem for every society. One measure of drought is the Palmer Drought Severity

    Index (PDSI), which is typically a monthly valued index (Heim, 2002). The typical value of PDSI ranges

    from -6 to +6 with negative values denoting dry spells and positive values denoting wet spells.

    We obtain the monthly PDSI for 107 locations in the central U.S. from January 1900 to December

    1997. Figure 3 displays the data for two typical months. We can see that there is significant spatial

    correlation in the data. Indeed, dry and wet spells occur with substantial spatial coherence across the

    region. Therefore there is no need to model 107 stations individually. A more concise representation

    should suffice for this data set. Thus, we consider a dimension reduced spatio-temporal approach to model

    PDSI.

    6.2.2 Dimension reduction

    First, we introduce the idea of spatio-temporal dimension reduction. The key is to recast the state vector

    in a much lower dimensional space by using a spectral basis (Wikle and Cressie, 1999). Let

    (34)

    where is an matrix of spectral basis function, , and contains the residual

    process induced by the truncation. We treat as our new state vector, which follows a first-order Markov

    process if follows such a process. Typically, is a non-dynamic (uncorrelated in time) spatial process.

    Now, rewrite the spatio-temporal model (1) in the light of dimension reduction,

    (35a)

    24

  • 7/31/2019 xu_wikle_070705

    25/33

  • 7/31/2019 xu_wikle_070705

    26/33

    105 100 95 9030

    32

    34

    36

    38

    40

    42

    44

    46

    48

    50

    PDSI 7/1988

    105 100 95 9030

    32

    34

    36

    38

    40

    42

    44

    46

    48

    50

    PRED 7/1988

    105 100 95 9030

    32

    34

    36

    38

    40

    42

    44

    46

    48

    50

    PDSI 7/1993

    105 100 95 9030

    32

    34

    36

    38

    40

    42

    44

    46

    48

    50

    PRED 7/1993

    Figure 3: PDSI data (left column) and prediction (right column) for two months. Dark (open) circle

    corresponds to negative (positive) PDSI value. Size of the circle is proportional to the magnitude of the

    PDSI value.

    26

  • 7/31/2019 xu_wikle_070705

    27/33

    (35b)

    where contains measurement error as well as truncation error , and as is typical, we assume that

    and are mean zero Gaussian processes that are temporally independent. This model is in the same form

    as (1) with slightly different notation. We proceed to specify the covariance of after taking into account

    the truncation error:

    (36)

    This is a simplified version of a formulation by Berliner et al. (2000). By using the next basis functions

    , this formulation amounts to a second dimension reduction. If are orthonormal, then it is evident

    that (36) is an example of model (9) with . We assume the covariance matrix of the model error

    process is diagonal, i.e., . This is reasonable since the spectral decomposition typically leads

    to decorrelation in spectral space.

    If one is interested in predicting the process at locations for which one does not have data, then it

    is important to consider explicitly as suggested by Wikle and Cressie (1999) and Cressie and Wikle

    (2002). However, if one is primarily interested in the dynamic process and/or its parameters ( )

    then it is simpler to consider through its marginal covariance ( in this case). This is directly anal-

    ogous to traditional mixed models where if one is interested in inference on the fixed effects, then one

    integrates out the random effects and considers the so-called marginal formulation. However, if one is

    interested in predicting the random effects then one considers the random effects directly (the so-called

    conditional specification). In this application, we are interested in forecasting the dynamic component so

    it is reasonable to consider the effects of marginally as indicated above.

    Although we could use any set of orthonormal basis functions (e.g., Fourier, wavelets, empirical),

    we choose to use EOFs (empirical orthogonal functions) for this example, since they are widely used in

    meteorological studies. EOFs are meteorologists name for the familiar principal components analysis

    for spatio-temporal data (see Wikle, 1996 for an overview). We obtain the EOFs, , by performing

    an eigenvalue decomposition of the estimated spatial covariance matrix of the data. Figure 4 shows the

    percent variability accounted for by each basis function. Notice the steep decline up to the 10th EOF. The

    remaining EOFs explain very little of the variability in the data. Therefore, we fix the truncation parameter

    at 10. Instead of modeling a 107-dimensional state vector, we now model a 10-dimensional state vector,

    which is a much easier task both statistically and computationally. Finally, we choose in (36) since

    the next 20 EOFs account for about 10% of the variability and adding more EOFs does not add much

    27

  • 7/31/2019 xu_wikle_070705

    28/33

    1 10 30 50 70 90 1070

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Percent

    EOF ID

    individual

    cummulative

    Figure 4: Percentages (solid line) and accumulated percentages (dashed line) of total variability accounted

    for by EOFs for the PDSI data.

    spatial structure.

    6.2.3 EM Estimation

    The parameters for model (35) are and . The M-step update formulas are Equation (4a), Propo-

    sition (4.2), Equation (4d) and Proposition (3.2), respectively. The EM iteration history is shown in Table

    4. The algorithm converges much faster than the ECM example discussed in Section 6.1.

    To assess the fit of the model, we examine the plots of predicted values of the state vector in the original

    space after making the transform, Figure 3 shows the one-month ahead prediction along

    with the observed data for each of two months. As we can see, the predictions are able to capture the major

    spatial pattern fairly well. Figure 6 shows time series plots for three stations for a 10-year period around

    the major Midwest flood year of 1993. In general, the model does a reasonable job of predicting the next

    months PDSI values even amidst the unusual the flood event. The prediction standard error map (Figure 5)

    shows the prediction error is roughly of the same order across the spatial domain and for different time

    periods.

    28

  • 7/31/2019 xu_wikle_070705

    29/33

  • 7/31/2019 xu_wikle_070705

    30/33

    1060 1080 1100 1120 1140 1160 1180

    5

    0

    5

    lon 102.30, lat 48.50

    1060 1080 1100 1120 1140 1160 1180

    5

    0

    5

    lon 95.90, lat 39.60

    1060 1080 1100 1120 1140 1160 1180

    5

    0

    5

    lon 92.30, lat 31.20

    month

    Figure 6: PDSI data (solid line) and prediction, (dashed line) for three stations.

    7 Summary and Conclusion

    We have proposed several parameterizations for a spatio-temporal dynamic model. The strategy is to make

    use of valid spatial statistical models and to make simple physical assumptions about the process which

    lead to partial restrictions on the transition matrix and/or the covariance matrices. We also derive the

    relevant (General) EM update formulas for these restrictions. We demonstrate this methodology with a

    simulation study in which the true state-process follows an advection-diffusion process. In addition, we

    apply the methodology to the problem of spatio-temporal modeling of monthly Palmer Drought Severity

    Index values over the central U.S.

    It is important to point out that although this development was motivated by spatio-temporal problems,

    the parameterization/GEM approach is quite flexible and the ideas contained in this paper can be used for

    other parameterizations as well. In particular, the state-space framework is useful for many multivariate

    time series problems (e.g., see applications in Shumway, 1988). However, the fact that we prefer closed

    form update formulas does put a limit on our choices for parameterizations. In addition, since there could

    be different parameterizations for the same data that are reasonable from a scientific perspective, it is

    desirable to have a consistent way of performing model selection (e.g., Bengtsson 2000).

    30

  • 7/31/2019 xu_wikle_070705

    31/33

    It is reasonable to ask what are the advantages and disadvantages of the EM/GEM approach presented

    here compared to a fully-Bayesian approach (e.g., Wikle 2003; Berliner et al. 2000). In many respects

    the EM/GEM approach can be thought of as an empirical Bayesian approach, whereby the data are used

    to estimate the lower level parameters in a hierarchical Bayesian model. In the spatio-temporal dynamical

    model framework, the fully Bayesian approach is most useful when one has some prior understanding (ei-ther from scientific theory, or from previous empirical studies) about the process dynamics, particularly the

    evolution operator (propagator). For example, Wikle (2003) considered a discretized PDE-based model

    analogous to the advection-diffusion example in Section 6. However, in that case the ecological problem at

    hand suggested that the dynamics were largely controlled by spatially-varying (yet unknown) diffusion co-

    efficients that corresponded to population spread. In that context, it was appropriate, given the relationship

    between heterogeneous population spread and habitat, that the diffusion parameters be spatially dependent

    and should thus have a spatial prior distribution. However, for a process such as example 6.1, in which one

    does not expect the advection and diffusion coefficients to be spatially-varying, then it is certainly reason-

    able to assume no spatial dependence and thus estimate the relatively few parameters empiricallly through

    the EM/GEM approach outlined here. In summary, when the model complexity increases, and/or when

    one has significant prior knowledge about the dynamics, one should use a fully-Bayesian approach. When

    one has a relatively simple model and little prior knowledge, then use the EM/GEM approach. However,

    the EM/GEM approach is of limited utility if the process is high-dimensional. Thus, one must parameter-

    ize the dynamics in this case to reduce the effective number of parameters. This is the approach we have

    outlined here.

    References

    Bengtsson, T. (2000). Time series discrimination, signal comparison testing, and model selection in the

    state-space framework. Ph.D. thesis, University of Missouri-Columbia.

    Berliner, L. M., Wikle, C. K., and Cressie, N. (2000). Long-lead prediction of pacific SST via Bayesian

    dynamic modeling. Journal of Climate, 13, 39533968.

    Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance

    functions. Journal of the American Statistical Association, 94(448), 13301340.

    31

  • 7/31/2019 xu_wikle_070705

    32/33

    Cressie, N. and Wikle, C. (2002). Space-time kalman filter. In A. El Shaarawi and W. Piegorsch, editors,

    Encyclopedia of Environmetrics, volume 4, pages 20452049, New York. Wiley.

    Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York, revised edition.

    Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data. Journal of the

    American Statistical Association, 97, 590600.

    Gupta, N. and Mehra, R. (1974). Computational aspects of maximum likelihood estimation and reduction

    in sensitivity function calculations. IEEE Transactions on Automatic Control, 19, 774783.

    Haberman, R. (1987). Elementary Applied Partial Differential Equations, Second Edition. Prentice-Hall,

    Inc., New Jersey.

    Harville, D. A. (1997). Matrix Algebra from a Statisticians Perspective. Springer, New York.

    He, Z. and Sun, D. (2000). Hierarchical bayes estimation of hunting success rates with spatial correlations.

    Biometrics, 56, 360367.

    Heim Jr., R. R. (2002). A review of twentieth-century drought indices used in the united states. Bulletin

    of American Meteorological Society, 83, 11491165.

    Huerta, G., Sanso, B., and Stroud, J. (2004). A spatiotemporal model for mexico city ozone levels. Journal

    of the Royal Statistical Society, Series C, 53, 231248.

    Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic

    Engineering, 82(D), 3545.

    Kyriakidis, P. C. and Journel, A. G. (1999). Geostatistical space-time models: a review. Mathematical

    Geology, 31(6), 651684.

    Lange, K. (1999). Numerical Analysis for Statisticians. Springer, New York.

    Mardia, K., Goodall, C., Redfern, E., and Alonso, F. (1998). The kriged kalman filter (with discussion).

    Test, 7, 217285.

    McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.

    Shumway, R. (1988). Applied Statistical Time Series Analysis. Prentice Hall, Englewood Cliffs, NJ.

    32

  • 7/31/2019 xu_wikle_070705

    33/33

    Shumway, R. H. and Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using

    the EM algorithm. Journal of Time Series Analysis, 3(4), 253264.

    Shumway, R. H. and Stoffer, D. S. (2000). Time Series Analysis and Its Applications. Springer, New York.

    Stroud, J., Mueller, P., and Sanso, B. (2001). Dynamic models for spatio-temporal data. Journal of the

    Royal Statistical Society, Series B, 63, 673689.

    Sun, D., Tsutakawa, R. K., and Speckman, P. L. (2000). Bayesian inference for CAR (1) models with

    noninformative priors. Biometrika, 86, 341350.

    Tanner, M. A. (1996). Tools for statistical inference : methods for the exploration of posterior distributions

    and likelihood functions. Springer, New York.

    Wikle, C. K. (1996). Spatio-temporal statistical models with applications to atmospheric processes. Ph.D.

    thesis, Iowa State University.

    Wikle, C. K. (2002). Spatial modeling of count data: A case study in modelling breeding bird survey data

    on large spatial domains. In A. B. Lawson and D. G. T. Denison, editors, Spatial Cluster Modelling,

    pages 199209. Chapman and Hall.

    Wikle, C. K. (2003). Hierarchical Bayesian models for predicting the spread of ecological processes.

    Ecology, 84, 13821394.

    Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach to space-time kalman filtering.

    Biometrika, 86(4), 815829.

    Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. Journal

    of Environmental and Ecological Statistics, 5, 117154.

    Wikle, C. K., Milliff, R., Nychka, D., and Berliner, L. (2001). Spatiotemporal hierarchical Bayesian

    modeling: Tropical ocean surface winds. Journal of the American Statistical Association, 96, 382397.

    33