An Ensemble Multiscale Filter for Large Nonlinear Data ...web.mit.edu/dennism/www/Publications/M49_2008_Zhou... · An Ensemble Multiscale Filter for Large Nonlinear Data Assimilation

An Ensemble Multiscale Filter for Large Nonlinear Data Assimilation Problems

YUHUA ZHOU, DENNIS MCLAUGHLIN, DARA ENTEKHABI, AND GENE-HUA CRYSTAL NG

Ralph Parsons Laboratory, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology,Cambridge, Massachusetts

(Manuscript received 24 October 2006, in final form 10 April 2007)

ABSTRACT

Operational data assimilation problems tend to be very large, both in terms of the number of unknownsto be estimated and the number of measurements to be processed. This poses significant computationalchallenges, especially for ensemble methods, which are critically dependent on the number of replicatesused to derive sample covariances and other statistics. Most efforts to deal with the related problems ofcomputational effort and sampling error in ensemble estimation have focused on spatial localization. Theensemble multiscale Kalman filter described here offers an alternative approach that effectively replaces, ateach update time, the prior (or background) sample covariance with a multiscale tree. The tree is composedof nodes distributed over a relatively small number of discrete scales. Global correlations between variablesat different locations are described in terms of local relationships between nodes at adjacent scales (parentsand children). The Kalman updating process can be carried out very efficiently on such a tree, especially ifthe update calculations exploit the tree’s parallel structure. In fact, the resulting savings in effort far exceedsthe additional work required to construct the tree. The tree-identification process offers possibilities forintroducing localization in scale, which can be used instead of or in addition to localization in space. Themultiscale filter is able to continually adapt to changing problem scales through associated changes in thetree structure. This is illustrated with a large (106) unknown turbulent fluid flow example that generatesdynamic features that span a wide range of time and space scales. This filter is able to track changingfeatures over long distances without any spatial localization, using a moderate ensemble size of 54. Thecomputational savings provided by the multiscale approach, combined with opportunities for hybrid local-ization over both space and scale, offer significant practical benefits for large data assimilation applications.

1. Introduction

Environmental data assimilation problems tend to bevery large, both in terms of the number of discreteunknowns to be estimated and the number of observa-tions to be processed. The large number of unknownsresults from a desire to capture the wide range of timeand space scales encountered in many environmentalproblems. So, for example, efforts to resolve smallerfeatures in ocean circulation models typically lead tofiner computational grids with more cells and more un-knowns. Until recently, the number of discrete statesassociated with spatially distributed environmentalmodels tended to be larger than the number of mea-surements available for data assimilation. The situationhas changed dramatically with the widespread availabil-

ity of networked observing systems and high-resolutionremote sensing observations. Now the number of ob-servations to be processed can be as large as or evenlarger than the number of discrete states.

The combination of many unknowns and many mea-surements imposes a large computational burden onthe probabilistic estimation methods commonly used tosolve environmental data assimilation problems. Suchmethods typically require repeated solutions of the dis-cretized governing equations (i.e., the forward model)and, in some cases, manipulation of very large matrices.Computational effort is an especially important issuefor ensemble Kalman filters, which can perform poorlyif the number of random replicates in the ensemble istoo small but can quickly become computationally in-feasible as the number of replicates is increased.

In the ensemble Kalman filter a dynamic forecastingmodel is used to propagate random replicates of thestate vector between measurement times (Evensen2003). When measurements become available, all of the

Corresponding author address: Dennis McLaughlin, Bldg. 48-329, 15 Vassar Street, Cambridge, MA 02139.E-mail: [email protected]

678 M O N T H L Y W E A T H E R R E V I E W VOLUME 136

DOI: 10.1175/2007MWR2064.1

© 2008 American Meteorological Society

MWR3531

replicates are updated to reflect the new informationgained. The filter updates are derived from sample co-variances computed from the propagated ensemble. Ifthe ensemble is too small, sampling errors can lead toincorrect updates. In fact, when sampling errors aresignificant measurement updating can actually be coun-terproductive, since the updated estimation error vari-ance can be greater than the prior variance (Hamill etal. 2001; Lorenc 2003).

The computational and sampling-related difficultiesencountered in ensemble filtering have prompted de-velopment of a number of methods for simplifying orapproximating large problems. Most of these methodsdivide the original problem into many smaller and moremanageable subproblems, using variations on the con-cept of localization. Spatial localization techniquessolve subproblems that focus on nearby states and mea-surements, relying on the assumption that correlationsbetween variables at different locations should be smallbeyond a certain characteristic separation distance. Ex-amples of this approach include methods based on co-variance filtering with Schur products (Hamill et al.2001; Houtekamer and Mitchell 2001; Reichle andKoster 2005), methods that perform updates in smallblocks of grid cells (Reichle et al. 2002; Margulis et al.2002; Ott et al. 2004), and hybrid methods that combinecovariance filtering with blocking (Keppenne and Rie-necker 2002). These localization methods improvecomputational efficiency while also suppressing the ad-verse effects of sampling error.

The sample covariances that are explicitly or implic-itly modified by localization can be viewed as approxi-mate descriptions of the physical relationships embed-ded in the forecasting model. If the covariances aremore or less arbitrarily changed it is possible that theywill no longer properly portray these relationships. Anumber of researchers have observed and commentedon imbalances introduced by localization in meteoro-logical applications (Mitchell et al. 2002; Lorenc 2003).The solution to imbalance problems is to increase thecharacteristic length scale of the localization procedure.While this improves balance it also reduces the filteringand efficiency gains that make localization attractive.

In effect, the localization scale is a tuning parameterthat trades off computational effort, sampling error,and imbalance error. It is likely that the optimum lo-calization scale is application-dependent and, as a re-sult, difficult to determine in advance. This is particu-larly true for problems where the dominant scales ofvariability change over time and space, requiring asso-ciated changes in the localization scale.

This paper presents a new multiscale approach toensemble data assimilation that is designed primarily to

improve computational efficiency but also providessome new options for introducing localization. Multi-scale estimation is based on the concept of replacing theforecast covariance with a multiscale tree that implicitlydescribes spatial correlations. The primary advantagesof the multiscale approach are its computational effi-ciency and its ability to adaptively localize in both spaceand scale. As in traditional ensemble Kalman filtering,the multiscale approach provides approximate esti-mates that may work better in some applications thanin others.

In the next section of this paper we provide a briefreview of ensemble filtering, primarily to introduce no-tation. We then introduce relevant multiscale estima-tion concepts. Next we show how ensemble methodsand multiscale estimation can be combined. We illus-trate the performance of the resulting ensemble multi-scale filter with an example and conclude with a reviewof the advantages and limitations of the multiscale ap-proach.

2. Ensemble filtering

The uncertainty in both models and measurements isthe primary justification for taking a probabilistic ap-proach to environmental estimation problems. Baye-sian estimation theory provides a convenient frame-work for such an approach. Suppose that the system ofinterest is characterized at time t by the spatially dis-cretized state vector x t . Each element of this vectorcorresponds to a distributed variable (e.g., pressure,temperature, velocity) evaluated at a particular cell ona fixed computational grid. The Bayesian approachtreats x t as a random vector that may be characterizedby an unconditional (or prior) joint probability densityp(x t).

Now suppose that measurements of the state or otherrelated variables available at time � are assembled inthe vector y�. Then the conditional density p(x t |y0:T)characterizes everything we know about x t from priorinformation and from measurements obtained in theinterval � ∈ [0, T ] (Jazwinsky 1970). It is generally nei-ther feasible nor desirable to derive this multivariatedensity for large problems. In practice, we focus par-ticular distributional properties, such as the mean, themode, and two-point covariances. Nevertheless, it isuseful to work with p(x t |y0:T) during problem formula-tion.

There are a number of ways to estimate properties ofp(x t |y0:T), depending on the problem at hand. Here wefocus on filtering problems, where estimates are desiredat the end of a growing measurement interval (so T isalways equal to the current time t). We suppose that the

FEBRUARY 2008 Z H O U E T A L . 679

state and measurement vectors are described as fol-lows:

x t � ft�x t�1, ut � and �1�

yt � Htx t � et , �2�

where ut is a vector of random model inputs, which arenot necessarily white or additive; et is a measurementnoise vector, which we assume to be zero mean andwhite in time; and x t has a random initial condition x0.The function ft() is derived from a spatially discretizedtime-dependent model of the system dynamics. To sim-plify the discussion we make the optional but conve-nient assumption that the measurements are related lin-early to the states, through the measurement matrix Ht.The random initial state, input vector, and measure-ment noise are all characterized by probability densi-ties, which we assume to be given. We also assume thatthese random vectors are independent of one another.Errors in model structure are assumed to be repre-sented by auxiliary model inputs included in (1).

The Markovian structure of the state equation in (1)enables us to solve the Bayesian estimation problemrecursively. In this case, the process of deriving theprobability densities of x t at t divides into two steps: aforecast from time t � 1 to time t and an update at timet. At any given time t � 0 the forecast step derivesthe forecast density p(x t |y0:t�1) from the previouslyupdated density p(x t�1 |y0:t�1). The update step at tderives the new updated density p(x t | y0:t) fromp(x t |y0:t�1) and the likelihood p(x t |yt).

In practice, the forecast and updated densities arefrequently assumed to be Gaussian so they can be com-pletely characterized by their means and covariances(Gelb 1974). This approach, which is taken in the clas-sical Kalman filter, can fail to capture important systemfeatures when the state and/or measurement equationsare nonlinear, or when ut, et, or x0 are non-Gaussian(McLaughlin 2007). The ensemble Kalman filter dealswith nonlinearity during the forecast step by workingwith ensembles of randomly generated replicates rep-resented by x j

t | t�1 (for the forecast ensemble at t) andx j

t | t (for the updated ensemble at t ; Arulampalam et al.2002; Gordon et al. 1993). In this case the nonlinearforecast step may be written as

x t | t�1j � f�x t�1 | t�1

j , u tj �, �3�

where x jt�1 | t�1 is obtained from the update at time t �

1, and xj0 | 0 and u j

t are synthetically generated replicatesdrawn from the initial state and input probability den-sities.

In the ensemble Kalman filter the updated replicates

are derived by adjusting the forecast replicates as fol-lows (Burgers et al. 1998; Evensen 1994):

x t | tj � x t | t�1

j � Kt�yt � e tj � y t | t�1

j �, �4�

where y jt | t�1 is a measurement prediction replicate de-

fined by

y t | t�1j � Htx t | t�1

j , �5�

and Kt is the Kalman gain defined by

Kt � cov�x t | t�1, Htx t | t�1��cov�Htx t | t�1� � Rt�1.

�6�

The vector e jt is a synthetically generated zero-mean

random measurement perturbation, drawn from thespecified probability density of the measurement erroret. The matrix Rt is the covariance of et.

In this paper the expression cov (v, w) indicates asample estimate of cov(v, w) computed from N repli-cates of v and w. This sample covariance can be writtenas a matrix product of the following form:

cov�v, w� �1

N � 1VWT, �7�

where V is a matrix with column j the mean-removedreplicate v j and W is a matrix with column j the mean-removed replicate w j. The expressions cov(v) and cov(v)are shorthand for cov(v, v) and cov(v, v), respectively.

In most ensemble estimation problems N is less thanthe dimensions of the state and measurement vectors,and the sample covariances used to derive the Kalmangain are rank deficient. However, the matrix inverted in(6) is full rank if Rt is full rank. Also, note that theKalman gain in the ensemble version of the Kalmanfilter is a nonlinear function of past measurements sinceit depends, through the sample covariances, on repli-cates derived from these measurements.

The linear update of (4) yields an ensemble that con-verges to the exact p(x t |y0:t) if the state and measure-ment at t have a Gaussian joint conditional density p(x t,yt |y0:t�1). There are many other versions of the en-semble Kalman filter, including square root forms thatdo not require the addition of random measurementperturbations in (5). Good reviews of some of the al-ternatives are provided by Tippett et al. (2003).

For nonlinear problems the ensemble Kalman filteris a compromise that makes no restrictive assumptionsduring the forecast step but makes implicit Gaussianassumptions during the update step. Despite this com-promise, experience shows that the ensemble Kalmanfilter provides an acceptable approximation in manyapplications. This issue is discussed in more detail in


citations provided in Evensen (2003) as well as in Zhouet al. (2006). There is, however, no guarantee that theassumptions made in the ensemble Kalman filter willalways be acceptable, and caution must be used whenapplying this technique to highly nonlinear problems.

As mentioned in the introduction, the ensemble Kal-man filter update has some features that can complicateits application to large environmental problems. First, itrequires computation and manipulation of large matri-ces (covariances, their square roots, or matrices of co-variance eigenvectors, depending on the particularcomputational implementation used). Second, it is lim-ited by sampling error, which can be significant whenthe ensemble is small. The multiscale estimation ap-proach deals with the computational issue by introduc-ing a fast scale-recursive update. This approach mayalso be able to reduce the adverse effects of samplingerror without introducing spatial localization.

3. Multiscale models

a. Multiscale trees

Multiscale trees provide an efficient and convenientway to represent correlations between the elements ofvery large state vectors, such as those obtained fromspatially discretized stochastic models. In a tree repre-sentation global correlations are built up from local cor-relations between nearby tree nodes. To see how this isdone we need to introduce a number of definitions.

A multiscale tree model consists of a set of abstractnodes that may be visualized as shown in Fig. 1. Groupsof nodes are organized into scales distinguished as sepa-rate rows in the tree diagram. The scale with the mostnodes (at the bottom) is called the finest scale, while thescale with only one node (the root node at the top) isthe coarsest scale. Each node s on the tree is associatedwith a relatively small nodal state vector �(s) of dimen-

sion n(s). A detailed discussion of multiscale modelingis provided in Willsky (2002).

In a tree model the nodes at any given scale arerelated indirectly through their connections with com-mon nodes located higher up the tree. As shown in Fig.1, each internal tree node is connected to a parent andto several children. The parent–child connections areindicated graphically by lines on the tree diagram. Forpresent purposes we suppose that every node (except afinest-scale node) has q children. We represent the chil-dren of s by s1, s2, . . . , sq. Also, every node (exceptthe root node) has a single parent s�. The index m(s)indicates the scale of node s (i.e., the row on the treediagram containing s). This index increases from 0 atthe top of the tree (coarsest scale) to M at the bottomof the tree (finest scale). The subset of finest-scalenodes that descend from node s is indicated by T (s) andthe corresponding vector of finest-scale states is �M(s).The set of all nodes at the finest scale is TM and thecorresponding vector of all finest-scale states is �M. Thesubtree rooted at node s (i.e., the set composed of s andall its descendants) is indicated by S(s).

In the applications considered here a multiscale treeis used to approximate conditional correlations be-tween elements of the global state vector x t at a par-ticular update time t. This can be done if the tree struc-ture is appropriately defined. We assume that the treetopology (e.g., the geometric arrangement of nodes onthe tree) is given but the definitions of the states at thetree nodes are to be determined. The process of iden-tifying the nodal states above the finest scale is dis-cussed in detail in the next section. For now it is suffi-cient to consider the definition of the finest-scale state.

Computational efficiency is achieved by dividing thevery large global state vectors x t | t�1 and x t | t into manysmall local vectors, each assigned to a particular finest-scale tree node. A convenient way to do this is to divide

FIG. 1. A multiscale tree with scaling factor q.


the computational grid (usually two- or three-dimen-sional) of the original system model into relativelysmall blocks of nearby grid cells. Each block of cells,indicated by the set B (s), is assigned to a particularfinest-scale tree node s. The resulting correspondencebetween global and tree states is given by

�M � P�x t | t�1 � E�x t | t�1�, �8�

where E() indicates mathematical expectation and P isa specified invertible matrix of ones and zeros thatmaps each element of x t | t�1 associated with B (s) to acorresponding element in �M. When applied to indi-vidual forecast replicates the node mapping can be writ-ten as

� Mj � P�x t | t�1

j � xt | t�1�, �9�

where x is the sample mean.

b. Multiresolution autoregressive models andinternality

In a multiscale estimation framework the states atdifferent finest-scale nodes are related indirectlythrough their common ancestors (Frakt and Willsky2001). There are many ways to do this. For estimationapplications it is especially convenient to construct thetree so that it satisfies a multiscale extension of thewell-known Markov property of time series analysis. Aswe shall see, this property makes it possible for mea-surement updates to be computed recursively in scale,with a two-stage process that moves up and then backdown the tree.

For now it is sufficient to note that, if the multiscaleMarkov property holds, the state at a given node s canbe related to the state at its parent s� by the followingdownward recursion:

��s� � A�s��s�� w�s�, �10�

where A(s) is a downward transition matrix and w(s) isa zero-mean random scale perturbation with covarianceQ(s). The root node state �(0) that initializes the recur-sion is a zero-mean random variable with covariancecov[�(0)]. The multiscale Markov property implies thatthe w(s) values at different nodes are uncorrelated withone another and with �(s). Note that the scale of theparent node s� is m(s�) � m(s) � 1.

An equivalent upward recursion can be written as

��s�� F�s��s� � w��s�, �11�

where F(s) is an upward transition matrix and w (s) is azero-mean random scale perturbation with covarianceQ (s). Here again, the multiscale Markov property im-plies that the w (s) at different nodes are uncorrelatedwith one another and with �(s). The upward recursion

is initialized with one of the random finest-scale zero-mean states in �M.

The upward and downward recursions given abovedefine a multiscale autoregressive (MAR) model of thetree states �(s). This MAR model is analogous to theautoregressive models frequently encountered in timeseries analysis. If (10) is applied repeatedly it can beused to construct the covariances of �M and x t | t�1 fromthe A(s) and Q(s) matrices and the root node covari-ance cov[�(0)]. So the downward scale recursion, whichrelies only on local relationships between parents andchildren, provides an alternative (approximate) de-scription of the global forecast sample covariance. Thisdescription is approximate because it is constrained bythe tree topology and the state definitions and recur-sion parameters used in (10). The basic idea of multi-scale estimation is to replace the information containedin the global forecast covariance with (10) and (11).There is no need to actually evaluate cov(�M) orcov(x t | t�1).

The MAR model of (10) and (11) enables us to de-velop very efficient multiscale updating algorithms. Totake advantage of this capability we need to know howto design a tree that satisfies the multiscale Markovproperty. The design process is greatly facilitated if wefocus on trees that have certain so-called internal prop-erties. To define these properties suppose that �(s) isthe state vector at node s at scale m(s) � M and�m(s)�1(s) is the vector of all states at the children of s,which are all at scale m(s) � 1. The tree is said to belocally internal if �(s) is a linear combination of thestates at its children, for all nodes on the tree. Thisrequirement can be expressed concisely as follows(Frakt and Willsky 2001):

��s� � V�s��m�s��1 � V�s��s�1�

···��s�q�

�m�s� � M, �12�

where V(s) is an n(s) � nm(s)�1 dimensional matrix as-sociated with node s and nm(s)�1 is the sum of the di-mensions of the state vectors �(si) for i � 1, . . . , q.The set of V(s) matrices defines, through (12), all thecoarser-scale states on the tree.

The multiscale Markov property has a convenientspecial form (called the scale-recursive Markov prop-erty) that holds when the tree is locally internal. Thescale-recursive Markov property relies on the fact thatany given node s at scale m(s) partitions the nodes atthe next finer-scale m(s) � 1 into q � 1 sets. The firstq sets consist of the q children of s. The final q � 1 setconsists of the complementary group of all nodes thatare at scale m(s) � 1 but are not children of s. The


scale-recursive Markov property holds if and only if thevector of all states in any one of these q � 1 sets isconditionally uncorrelated with the vector of all statesin each of the remaining q sets, given �(s) (Frakt andWillsky 2001). When this property is satisfied we canderive the MAR model matrices A(s), Q(s), F(s), andQ (s).

c. Identification of multiscale tree models

The discussion presented above indicates that we canobtain the MAR model needed for an efficient multi-scale measurement update if we can select the V(s) toinsure that scale-recursive Markov property is satisfied.In particular, at each internal tree node s, V(s) shouldconditionally decorrelate all the states in the q � 1 setsof nodes partitioned by node s. To obtain perfect decor-relation we generally need to use high-dimensionalcoarser-scale nodal state vectors. This defeats the pur-pose of the tree, which is to provide a concise and ef-ficient alternative to traditional estimation methods.For practical applications we need to constrain statedimensionality at each coarser-scale node or, equiva-lently, we need to limit the number of rows in the cor-responding V(s) matrix to be less than or equal to somespecified value d(s). Then the identification problem atnode s reduces to a search for the V(s) with n(s) � d(s)that minimizes the conditional covariance among theq � 1 sets partitioned by s.

The constrained tree-identification problem is easierto solve if we simplify the decorrelation process by onlyrequiring that the state zi(s) � �(si) associated withchild i of node s must be conditionally uncorrelatedwith all other states at scale m(s) � 1. These otherstates are collected in the complementary vector zic(s)(Frakt and Willsky 2001). This simplification makes itpossible to focus on pairwise conditional correlationsbetween zi(s) and the individual elements of zic(s).

An additional simplification is achieved if we limitthe set of V(s) candidates to block diagonal matriceshaving the following form:

V�s� � diag�V1�s�, . . . , Vq�s�. �13�

The submatrix Vi(s) corresponds to child si and hasdimension di(s) � n(si), where di(s) is specified. Theblock diagonal structure of V(s) implies that each rowof �(s) is a linear combination of states at a particularchild of s. The block diagonal restriction enables us todivide the V(s) identification problem into q smallerproblems that each focus on the influence of a particu-lar Vi(s) on the conditional covariance between zi(s)and zic(s) (Frakt and Willsky 2001).

With these simplifications in mind, consider the con-ditional covariance, given �(s), between zi(s) and zic(s):

cov�zi�s�, zic�s�|��s�

� E��zi � E �zi |��s��zic � E �zic |��s��T�

� E��s�i� � E ��s�i� |��s��zic � E �zic |��s��T�.

�14�

This covariance needs to be zero, for each i � 1, . . . , q,in order for the scale-recursive Markov property tohold exactly. In practice, we cannot obtain zero corre-lation if di(s) is constrained. Instead, we seek the Vi(s)that gives the smallest possible cov[zi(s), zic(s) |�(s)].

Rather than attempt to minimize the conditional co-variance matrix directly we use an indirect methodcalled predictive efficiency, which achieves the sameobjective by working with the following scalar mean-squared error Ji(s):

Ji�s� � E��zic�s� � zic�s�T�zic�s� � zic�s��, �15�

where zic(s) is an estimate of zic derived from �(s). Thisexpression is minimized, for a given �(s), by the condi-tional expectation (Jazwinsky 1970):

zic�s� � E �zic�s� |��s�. �16�

If �(s) is expressed as �(s) � Vi(s)zi(s), the minimum ofJi(s) depends on V(s). If V(s) is an identity with di(s) �n(si), the conditional covariance cov[zi(s), zic(s) |�(s)]is zero [this is easily checked by substituting �(s) � zi(s)into (14)] and Ji(s) has some value Ji0(s). For any otherVi(s) with di(s) � n(si) the Ji(s) will be at least as largeas Ji0(s) and the conditional covariance will not be zero.In this case, the departure of the conditional covariancefrom its desired value of zero can be measured in termsof the difference between Ji(s) and Ji0(s) (Frakt andWillsky 2001):

� �zic�s� |��s� � Ji�s� � Ji��s�

� trace�cov�zic , zi� cov�1�zi� cov�zi , zic�

� trace�cov�zic , zi�ViT

� �Vi cov�zi�ViT�1Vi cov�zi, zic��. �17�

The expression for � [zic(s) |�(s)] given after the secondequality applies when the tree states are jointly Gauss-ian, which is the implicit assumption used in the en-semble Kalman filter update. In the predictive effi-ciency approach the best choice of Vi(s) is taken to bethe one that minimizes � [zic(s) |�(s)]:

Vi�s� � arg minV�s�

� �zic�s� |Vi�s�zi�s�. �18�

This choice also minimizes the conditional covariancecov[zi(s), zic(s) |�(s)], as desired.

Frakt and Willsky (2001) show that the Vi(s) that


minimizes � [zic(s) |�(s)] is given by the first di(s) rows ofthe following matrix V i (s):

V�i�s� � UiT�s� cov�zi�s��1�2. �19�

The columns of the matrix Ui(s) are the eigenvectors ofthe following n(si) dimensional square matrix:

cov�1�2�zi�s� cov�zi�s�, zic�s�

� covT�zi�s�, zic�s� cov�T� 2�zi�s�. �20�

These eigenvectors are assumed to be arranged accord-ing to the magnitudes of the corresponding eigenvalues,from largest to smallest.

The predictive efficiency method outlined above uses(19) and (20) to compute a local internal matrix Vi(s)for each child of s. The q Vi(s) matrices obtained for allthe children can be assembled to form V(s), as specifiedin (13). The total number of rows in the resulting V(s)matrix may exceed the total number of rows d(s) origi-nally specified for V(s) in the nodal state vector sizeconstraint. Following Frakt and Willsky (2001), we dealwith this issue by retaining only the d(s) rows in V(s)that correspond to the d(s) largest predictive efficiencyeigenvalues. Note that this will generally result in someof the reduced Vi(s) matrices having more rows thanothers. That is, some children will contribute more el-ements to �(s) than others.

Note that the V(s) matrices identified in the predic-tive efficiency procedure depend on low-dimensionalprior covariances of the states in the vectors zi(s) andzic(s) at scale m(s) � 1. The scale m(s) � 1 covariancesdepend, in turn, on the V(s) matrices and prior covari-ances at scale m(s) � 2, and so on. This interscale de-pendence suggests that the V(s) matrices should becomputed recursively, starting with covariances be-tween elements of �M at the finest scale and graduallymoving up the tree. Equation (12) is used to derive therequired �(s) covariances at scale m(s) from those atscale m(s) � 1.

In an ensemble implementation the exact finest-scaleprior covariances are replaced by sample covariancescomputed from the finest-scale prior replicates � j

M,which are derived from (9). Then V i (s) and Vi(s) atscale m(s) � M � 1 are obtained by substituting thesample covariances into (19) and (20). These internalmatrices are used to compute the prior replicates atscale m(s) � M � 1, according to the ensemble versionof (12):

� j�s� � V�s�� m�s��1j . �21�

This provides the set of replicates needed to computesample covariances and V(s) matrices at scale m(s) �M � 2 and so on, up the tree to the root node.

It is possible to derive the MAR scale transition andcovariance matrices that appear in (10) and (11) di-rectly from the V(s) obtained from the predictive effi-ciency procedure. In an ensemble implementationthese tree parameters can be written as functions ofsample prior covariances as follows (Frakt and Willsky2001):

A�s� � cov��s�, ��s��cov�1��s��, �22�

F�s� � cov��s��A�s�Tcov�1��s�, �23�

Q�s� � cov��s� � A�s�cov��s�, ��s��T, and

�24�

Q��s� � cov��s�� F�s�A�s�cov��s��. �25�

These sample covariances can all be derived recur-sively, from the replicates computed in (21).

If the size of the vector zic(s) is large derivation of thesample cross covariance between zi(s) and zic(s) re-quired in (20) can be computationally demanding. Thecomputational effort can be dramatically reduced if thesample cross covariance only considers correlations be-tween zi(s) and the elements of zic(s) nodes in a limitedset (or neighborhood) N h(si) of h nodes at scale m(s) �1. This set can be selected to focus on nodes that aremost likely to be strongly correlated with zi(s). Oneoption is to relate N h(si) to the spatial support of si.Each node si in the tree has a spatial support definedby the set of grid cells assigned to nodes in T (si), theset of finest-scale descendants of si. Nodes included inN h(si) also have spatial supports defined by their fin-est-scale descendants. If we wish to use spatial proxim-ity as a criterion for selecting N h(si), we can, for ex-ample, assign to N h(si) those nodes at scale m(s) � 1whose supports are adjacent to the support of si. Thisis the approach taken in our example.

Frakt and Willsky (2001) show that the computa-tional complexity of the predictive efficiency procedurecan be reduced from O[n(x)2] to O[n(x)] if the samplecovariances for zi(s) only include elements of zic(s) as-sociated with that lie in N h(si). The result is a sub-stantial savings in computational effort for large prob-lems.

Note that the use of spatial proximity screening in thepredictive efficiency procedure differs from the spatiallocalization techniques discussed in the introduction.First, predictive efficiency is used only to identify treeparameters, so the screening procedure does not limitthe scope of updates. Even with screening, each node isrelated to all other nodes and to all measurementsthrough common ancestors on the tree. Second, thespatial support associated with a neighborhood with agiven size h increases in size at coarser scales, making it


easier for important long-distance correlations to havean impact on the overall structure of the tree. Predictiveefficiency screening, like spatial localization, tends tofilter out spurious fluctuations due to sampling errors.The example discussed later in this paper illustrates thisfiltering action. Of course, if the screening is too severeimportant information will be lost and there will be adecline in performance. So some judgment is involvedin selecting the appropriate neighborhood size h for agiven application.

It is useful to summarize the approximations and sim-plifications adopted in the tree-identification procedureoutlined above:

• The tree covariances and related tree parameters aresample estimates derived from a finite ensemble offorecast states.

• There is an upper limit on the dimension d(s) ofcoarser-scale states.

• The predictive efficiency method used at node s con-siders only correlations between zi(s) and zic(s),rather than between all q � 1 sets of scale m(s) � 1nodes partitioned by node s.

• The predictive efficiency method assumes that theV(s) matrix is block diagonal.

• At any given node si the predictive efficiencymethod only considers correlations with nodes in thenodal neighborhood N h(si) at scale m(s) � 1.

These approximations may compromise the tree’sability to reproduce the true forecast covariance. How-ever, in an ensemble application the last four approxi-mations may also help suppress spurious long-distancecorrelations due to the first approximation. When thisoccurs, the approximate tree covariance may actuallybe more accurate than a traditional sample covariancederived from the same ensemble. This possibility is dis-cussed further in the conclusions section.

4. An ensemble multiscale update procedure

Like the standard ensemble Kalman filter, the en-semble multiscale filter is designed to update the fore-cast replicates x j

t | t�1 with a set of current measurementsyt. The result is a new ensemble of updated replicatesx j

t | t. The multiscale update is carried out on a tree char-acterized by V(s) matrices derived from the forecastensemble, using the predictive efficiency method de-scribed in the previous section. In the ensemble versionall exact covariances cov() are replaced by sample co-variances cov() computed over the replicates � j(s). Ifthe tree provides a perfect representation of the sampleforecast covariance the multiscale update will give thesame result as the traditional ensemble Kalman filter.

In general, the two updates will differ because the treeonly approximates the sample covariance.

To use a multiscale framework we need to assign theglobal measurements in the vector yt to particular treenodes, much as the global states at grid cells are as-signed to the finest-scale tree nodes. In many geophysi-cal applications measurements are taken over a rangeof spatial supports, varying from subgrid cell scales toregions as large as the entire computational domain.The most flexible way to accommodate a diverse set ofmeasurements is to relate them all to the states at thefinest scale. To make this explicit it is helpful to intro-duce some new definitions. In particular, suppose thatthe support of a particular scalar measurement yti in theglobal measurement vector yt consists of cells assignedto tree nodes in a subset TMi of the set of all finest-scalenodes TM. This measurement can be located at any treenode s that lies above all nodes in TMi, as implied by thefollowing expression:

y�s� � yti ; TMi ⊆ T �s�. �26�

Recall that T (s) is the set of all finest-scale nodes de-scended from s.

In practice, a number of issues need to be consideredin locating measurements on the tree, including the sizeof the nodal measurement vector (which affects the costof various matrix computations) and the number of de-scendant nodes without measurements (which affectsthe cost of the update procedure). In general, the mul-tiscale approach outlined here offers considerable flex-ibility since all nodal measurements are ultimately re-lated to physically identifiable grid cells, regardless ofwhere they are placed on the tree.

Once a measurement has been assigned to a particu-lar tree node s it may be related to finest-scale statesthrough the following tree-based measurement equa-tion:

y�s� � h�s��M�s� � e�s�. �27�

Here h(s) is matrix that relates the measurement to thefinest-scale tree states �M(s) descended from s and e(s)is a zero-mean measurement error vector with specifiedcovariance r(s). All of the measurements in the vectoryt may be assigned to tree nodes in this way. The resultis a group of measurement equations having the form of(27) for the set H of all nodes with measurements.

Willsky (2002) and a number of references he citesdescribe a static two-sweep multiscale estimation algo-rithm that derives the Gaussian conditional mean andcovariance for states and measurements distributed ona multiscale tree. Here we present an adaptation of thisalgorithm suitable for ensemble applications. In our


version the focus is on random replicate updates, ratherthan moment updates. Appendixes A and B of Zhou(2006) show that the sample moments of the updatedreplicates converge to the exact conditional moments asthe number of replicates increases.

The multiscale updating procedure is a two-sweeprecursion that moves up the tree and then back down.In an ensemble implementation the upward sweep gen-erates replicates � j(s |s) at node s that are conditionedon all measurements at or below the scale of s. Thedownward sweep generates replicates � j(s |S) at node sthat are conditioned on all measurements on the tree,above as well as at or below s. Each updated globalreplicate x j

t | t is constructed from the elements of thefinest-scale replicate obtained at the end of the down-ward sweep. The details are described in sections 4aand 4b.

a. Upward sweep

The upward sweep of the ensemble multiscale updatealgorithm derives at each node s a set of updated rep-licates � j(s |s) that depend on measurements located ats and its descendants. The algorithm is a recursion thatperforms an update at each node at a given scale andthen proceeds to the next higher scale. The nature ofthe update at a given node s depends on whether mea-surements are located at or below the s. The generalform is similar to the classical Kalman filter update andcan be written in two parts. First is the case in whichthere is no measurement at or below the specified nodes. This condition can be stated as H ∩ S(s) � �, whereS (s) is the set of nodes at or descended from s and � isthe null set:

� j�s |s� � � j�s�; H ∩ S (s) = �. (28)

Here the updated replicate is the same as the priorobtained from the forecast step. Next is the case inwhich there is at least one measurement at or below s:

� j�s |s� � � j�s� � K�s��Y j�s� � Y j�s�; H ∩ S�s� � �,

�29�

where K(s) is a multiscale version of the Kalman gain,defined below; Y j is an augmented perturbed measure-ment vector; and Y j is an augmented measurement pre-diction vector.

At scales m(s) � M above the finest scale the per-turbed and predicted measurement vectors associatedwith a measured node s that lies above other measurednodes are

Y j�s� � �K�s1�Y j�s1�

···K�sk�Y j�sk�

y�s� � e j�s��; s ∈ H and m�s� � M,

�30�

and

Y j�s� � �K�s1�Y j�s1�


h�s�� Mj �s�

�; s ∈ H and m�s� � M.

�31�

Here the node s�i is a child of s that lies at or above ameasured node and 0 � k � q. In terms of the notationdefined above:

H ∩ S�si� � �; i � 1, . . . , k. �32�

The zero-mean random measurement perturbationej(s) has the same covariance r(s) as e(s) in (27) and isincluded to insure that the update algorithm yields thecorrect conditional covariance. If k � 0 or if s is at thefinest scale there are no measured nodes below s andthe two augmented measurement vectors at s are

Y j�s� � y�s� � e j�s�; s ∈ H, �33�

Y j�s� � h�s�� Mj �s�; s ∈ H. �34�

At scales m(s) � M above the finest scale the per-turbed and predicted measurement vectors associatedwith an unmeasured node s that lies above measurednodes are

Y j�s� � �K�s1�Y j�s1�


� ; s ∉ H and m�s� � M,

�35�

and

Y j�s� � �K�s1�Y j�s1�


� ; s ∉ H and m�s� � M.

�36�

In this case the final entries appearing in (30) and (31)are omitted. Taken together, the augmented measure-


ment equations form a recursion that propagates theinformation in measurements placed at tree nodes upthe tree so that any node that is measured or has ameasured descendant will be updated.

The multiscale Kalman gains used to compute theupdate and to construct the augmented measurementvectors depend on an augmented measurement errorcovariance matrix that reflects the definitions of theaugmented measurement vectors. At scales above thefinest scale the augmented measurement error covari-ance matrix R(s) depends on gains and measurementerror covariances at the children s�1, . . . , s�k. The ex-pression for R(s) at a measured node s that lies aboveother measured nodes is

R�s� � diag�K�s1�R�s1�KT�s1�, . . . ,

K�sk�R�sk�KT�sk�, r�s�;

s ∈ H and m�s� � M. �37�

Here diag[] represents a square matrix with k � 1 �k � 1 square blocks. The diagonal blocks for i � 1, . . . ,k have dimension n(s�i), and diagonal block k � 1 hasdimension n[y(s)], where n[ ] indicates the dimension ofthe vector argument. All off-diagonal blocks are zero. Ifk � 0 or is s is at the finest scale there are no measurednodes below s and the augmented measurement errorcovariance matrix is

R�s� � r�s�; s ∈ H. �38�

The covariance expression for an unmeasured node sthat lies above measured nodes is

R�s� � diag�K�s1�R�s1�KT�s1�, . . . ,K�sk�R�sk�KT�sk�; s ∉ H and m�s� � M.

�39�

In all cases the Kalman gain is given by

K�s� � cov��s�, Y�s��cov�Y�s� � R�s��1; m�s� � M.

�40�

This multiscale gain matrix depends only weakly on thereplicate values [through its dependence on R(s) andthe sample covariance matrices] and will generally befull rank. In this case the matrix to be inverted in (40)is nonsingular, even for small ensemble sizes.

The gain computation starts at the finest-scale m(s) �M, where the measurement error covariance and Kal-man gain are computed at measured nodes. The repli-cates at all finest-scale nodes are then updated with (28)and (29). After this, the recursion moves up to scalem(s) � M � 1, where the new augmented measurementerror covariance is computed from the finest-scale mea-

surement error covariances and Kalman gains at allmeasured finest-scale nodes. Then the gains at scalem(s) � M � 1 are computed, the replicates are up-dated, and the recursion proceeds up to the next scale.This continues until upward sweep updates have beenperformed at all scales. Note that the upward sweepalgorithm given here does not make explicit use of themultiscale transition equations of (10) and (11) but itdoes use V(s) matrices and (12) to compute the samplecovariances in the Kalman gain expression. As we haveseen, these V(s) matrices convey the same informationas the scale transition equations.

b. Downward sweep

The downward sweep of the ensemble multiscale up-date algorithm derives at each node s a set of smoothedreplicates � j(s |S) that depend on all measurements onthe tree. At the end of the upward sweep, the updatedroot node replicates � j(s |s) � � j(s |S) at scale m(s) � 0incorporate all tree measurements and so already con-stitute a smoothed ensemble. At any node s below theroot smoothed replicates � j(s |S) are obtained by ad-justing the corresponding updated replicates � j(s |s)from the upward sweep.

The adjustment at scale m(s) is carried out with alinear update similar to the one used for smoothingproblems in time series analysis:

� j�s |S� � � j�s |s�; m�s� � 0, and �41�

� j�s |S� � � j�s |s� � J�s�� j�s� |S� � � j�s� |s�; m�s� 0.

�42�

This update requires computation of a set of projectedreplicates � j(s� |s) at s� for m(s) � 0. The projectedreplicates characterize the state at the parent of s, givenmeasurements at s and its descendants. They are ob-tained from the following version of the MAR upwardrecursion in (11):

� j�s� |s� � F�s�� j�s |s� � w�j�s�; m�s� 0. �43�

The random perturbation w j(s) is added to insure thatthe sample statistics are consistent with those of (11).Note that a different set of projected replicates is ob-tained at s� from each of the q children of s�.

The smoothing gain J(s) in (42) is given by

J�s� � cov��s |s�FT�s�cov�1��s� |s�. �44�

The replicate adjustment made in (42) is proportionalto the difference between the smoothed replicate� j(s� |S) and the projected replicate � j(s� |s) at s�. Thisdifference reflects the new information obtained frommeasurements that are not accounted for in � j(s� |s)[i.e., are not at nodes in S (s) but are accounted for in� j(s� |S). These nodes are in the complement of S (s)].


Note that � j(s� |S) is available from the downwardsweep computations at scale m(s) � 1, the scale abovem(s).

The downward sweep ends at the finest-scale m(s) �M with a set of smoothed replicates � j(s |S). The de-sired ensemble of updated global replicates is obtainedfrom the inverse of (9):

x t | tj � xt | t�1 � P�1� MS

j ; m�S� � M, �45�

where x jMS is a replicate of all smoothed finest-scale

states. The smoothed replicates account for all mea-surements taken at time t. The measurement update iscarried out entirely on the tree and does not requirecomputation of global covariances.

c. Algorithm summary

The multiscale ensemble estimation algorithm sum-marized in sections 3 and 4 consists of several distinctsteps:

Forecast• The simulation model (state equation) of (3) is

used to derive random forecast replicates of thediscretized state vector x j

t | t�1 at time t from ran-dom updated replicates x j

t�1 | t�1 at time t � 1.The replicates at both times are conditioned onmeasurements taken through t � 1.

Tree identification and computation of prior treestates• Equation (9) is used to derive the finest-scale

tree replicates in � jM from the forecast replicates

x jt | t�1 for a specified tree topology (e.g., the

number of scales M � 1, the number of childrenper parent q, and the number of states per finest-scale node d).

• Local sample state covariances are constructedat each node s at the finest-scale m(s) � M.These covariances and (19) and (20) are used tocompute the internal weighting matrices V(s).

• The prior state replicates at the next scale m(s) �M � 1 are derived from the V(s) matrices using(21).

• The upward and downward tree parameters forscale m(s) are computed from local sample crosscovariances of the state replicates, as specified in(22) through (25).

• The identification process continues recursivelyup the tree until it reaches the root node atm(s) � 0.

Measurement allocation• Individual elements yti of the global measure-

ment vector yt are assigned to tree nodes subject

to the requirement of (26). The measurements atnode s are related to �(s) by (27).

Update: Upward sweep• The augmented perturbed and predicted mea-

surement vectors, the measurement error covari-ance, and the Kalman gain are constructed at thefinest scale from the prior state replicates in � j

M

and (33), (34), (38), and (40).• The updated states � j(s |s) at the finest-scale

m(s) � M are derived from (28) and (29).• The augmented perturbed and predicted mea-

surement vectors, the measurement error covari-ance, and the Kalman gain at nodes at the nextscale m(s) � M � 1 are derived from the priorstate replicates � j(s) and (30), (31), (35), (36),(37), (38), (39), and (40).

• The updated states � j(s |s) at scale m(s) � M �1 are derived from (28) and (29).

• The updating process continues recursively upthe tree until it reaches the root node at m(s) �0, yielding � j(s |s).

Update: Downward sweep• The smoothed replicates � j(s |S) at the root node

are constructed from (41).• Equations (43) and (44) are used to compute the

projected replicates � j(0 |s) and smoothing gainJ(s) for each node s at scale m(s) � 1.

• Equation (42) is used to derive the smoothed rep-licates � j(s |S) at each node s at scale m(s) � 1.

• The smoothing process continues recursivelydown the tree until it computes the smoothedreplicates � j(s |S) at the finest-scale m(s) � M.The global updated replicates x j

t | t are obtainedfrom (45).

d. Computational complexity

The complexity of the multiscale update depends onboth the tree identification (which must generally beperformed at every update time) and the measurementupdate. To obtain some order of magnitude estimatesof computational complexity we suppose that 1) thestate dimension at every node is d, 2) every global (fin-est scale) state is measured, 3) each parent has q chil-dren, 4) the number of scales is M � 1, and 5) thepredictive efficiency covariance calculation uses ascreen consisting of h nodes.

The complexities of the major computational opera-tions for the corresponding multiscale filter are sum-marized in Table 1. It is apparent that the computa-tional effort required by this filter depends strongly onthe size of the nodal state dimension d. That is why it isimportant to limit the size of this tree parameter small.Also, computational effort may be decreased from the


estimates in Table 1 if the measurements are sparse andlocated high on the tree, so that nodes lower on the treedo not need to be updated during the upward sweep.The memory requirement for the update is as much as2qM times the storage requirement for one finest-scalenode.

We can compare the complexities for the multiscaleand traditional Kalman filters if we make the reason-able assumptions that the number of replicates N � dand the neighborhood size h � q2. Then the traditionalKalman filter has complexity O(q3Md3), which is muchmore expensive than the multiscale update, whose totalcomplexity is O(qMd3).

If spatial localization is used the complexity of thetraditional filter can decrease significantly, dependingon the approach. For example, when the region of in-terest is divided into qM completely independentblocks, each containing d states, the traditional filtercomplexity decreases to O(qMd 3). This complexity,which is the same as the multiscale filter, is obtained atthe cost of ignoring all correlations between states andbetween states and measurements that extend beyond

the spatial localization block. The complexities of otherlocalization methods are likely to fall between those ofthe traditional and multiscale filters. The multiscale ap-proach is most computationally advantageous, com-pared to localized Kalman filters, when the problem athand cannot be readily divided into subproblems offixed size. This can occur in situations in which domi-nant features can vary greatly in scale over both timeand space.

5. Example

a. Experimental setup

As suggested above, the concept of multiscale dataassimilation is best illustrated with problems that ex-hibit a range of evolving scales. The two-dimensionalincompressible flow example considered here has suchvariable scales. The flow field is characterized by ran-dom vortices that grow and move over time. These vor-tices are generated by the impact of a jet hitting a bar-rier placed near one of the domain boundaries, asshown in Fig. 2. The simulation model used for theforecasting step of the data assimilation is perfect ex-cept for uncertainty in the position of the jet along thedomain boundary. The uncertain boundary conditionleads to uncertain velocities with correlation scales thatcontinually change as the vortices pass by.

The objective in our example is to characterize thecomplete velocity field with conditional statistics de-rived from synthetically generated measurements ofthe longitudinal (u) velocity component. The measure-ments are generated from a “true” velocity field whichis obtained by running the forecasting model with asingle time-dependent random realization for the jet

FIG. 2. Spatial domain for the example.

TABLE 1. Computational complexity for the ensemble multiscalefilter.

Tree identification

Local covariance calculation ateach parent node

O[qh(d 2N )]

Singular value decomposition ateach parent node

O(qhd 3)

Total identification complexity: O[qMh(d 2N � d 3)]Update

Upward sweep O[qM�2(d 2N � d 3)]Downward sweep O[qM(d 2N � d 3)]Total update complexity O[qM(q2 � 1)(d 2N � d 3)]


position. The problem is sufficiently large (over 106

states) to be computationally challenging.The governing equations are the two nondimensional

momentum equations and the continuity equation:

�u

�t� u

�u

�x� �

�u

�y� �

�p

�x� ��2u

�x2 ��2u

�y2�, �46�

��

�t� u

��

�x� �

��

�y� �

�p

�y� ��2�

�x2 ��2�

�y2�, �47�

and

0 ��u

�x�

��

�y, �48�

where u and � are the longitudinal and transverse ve-locity components, p is pressure, and � is the eddy vis-cosity. The eddy viscosity is set equal to zero in (46) and(47), but numerical viscosity is generated at the cornersof the barrier when the velocity equations are dis-cretized.

We solve (46)–(48) with Gerris (Popinet 2003), whichis an open source free software library for fluid flowapplications. Gerris uses an adaptive mesh refinementmethod that adds detail in regions with small-scale flowfeatures. The solution algorithm combines finite vol-ume spatial discretization with a quad-tree nested re-finement scheme and a multilevel Poisson solver forpressure computations. Advection terms are discretizedusing a robust second-order upwind scheme.

The boundary conditions for the example are listedin Table 2. The random jet center position c(t) is de-scribed by the following temporally discretized autore-gressive equation:

c�t� � 0.9c�t � 1� � w�t�, �49�

where w(t) is zero-mean white normally distributednoise with standard deviation 0.04. This expression

causes the jet entrance to move up and down along theleft boundary in a random fashion, staying mostly nearthe center of this boundary. The initial conditions arezero velocity everywhere.

For the example considered here it is convenient tospecify a regular grid for input and output purposes.Gerris refines this grid internally, without interventionfrom the user. The computational effort of the refinedsimulation is generally proportional to the size of theuser-specified grid. We consider two resolutions for theregular input–output grid: a coarse 128 � 64 grid that isused to assess the tree’s ability to represent the forecastcovariance and a fine 1024 � 512 grid that is used todemonstrate the estimation capabilities of the ensemblemultiscale filter. Both grids cover the simulation do-main of Fig. 2.

The states to be estimated in this problem are thediscretized longitudinal and transverse velocities de-fined on the flow model’s regular computational grid(coarse or fine). The uncertain jet position is not esti-mated but the state updates attempt to correct for jetposition errors (since the effects of position uncertaintyare reflected in the random state replicates). The dis-cretized pressures are diagnostic variables that are de-rived from the velocities at the previous time step. Thetree properties used for the coarse and fine multiscalemodels are summarized in Table 3.

b. Assessment of the tree approximation

The performance of the ensemble multiscale filter de-pends on the tree’s ability to represent the true forecastcovariance matrix. To obtain a quantitative assessmentof this ability it is helpful to distinguish 1) cov�(x t | t�1),the forecast covariance obtained from the downwardrecursion (10), using tree parameters derived from the

TABLE 2. Boundary conditions for the example.

Boundary Condition Notes

Left Specified velocity (jet) y ∈ [c(t) � 0.12, c(t) � 0.12] ; �0.5 � y � 0.5 ; c(t) defined in textu(�0.5, y, t) � 5, � (�0.5, y, t) � 0

Left Specified velocity y ∉ [c(t) � 0.12, c(t) � 0.12]; �0.5 � y � 0.5u(�0.5, y, t) � � (�0.5, y, t) � 0

Top Slip �0.5 � x � 1.5 u�x, 0.5, t�

y� � �x, 0.5, t� � 0

Right Open �0.5 � y � 0.5 u�1.5, y, t�

x�

� �1.5, y, t�

x� 0

Bottom Slip �0.5 � x � 1.5 u�x, −0.5, t�

y� � �x, −0.5, t� � 0

Impermeable barrier Slip on all four sides �0.22 � x � �0.08, �0.115 � y � 0.115


forecast replicates, the predictive efficiency procedure,and a given tree topology; 2) cov(x t | t�1), the sampleforecast covariance derived directly from the ensemblexj

t | t�1; and 3) the true forecast covariance cov(x t | t�1).For practical purposes, we suppose that the true fore-cast covariance is equal to the sample covariance ob-tained from a very large ensemble.

To assess the tree approximation we use an ensembleof 6240 replicates to estimate cov(x t | t�1) over the

coarse model grid described in Table 3. Tests withsmaller ensembles indicate that the covariance compu-tations nearly converge to asymptotic values for N �6240. We compare the large-sample “true” forecast co-variance with the small-sample forecast covariancecov(x t | t�1), and small-sample tree-based covariancecov�(x t | t�1), each derived from 52 replicates. The re-sults of this three-way covariance comparison areshown in Figs. 3 and 4. Figure 3 plots contours of the

FIG. 3. Forecast velocity (u and �) correlation coefficients between point (8, 8) and all the grid cells over a 64 �128 domain at t � 0.84. (top) Sample correlation from an ensemble with 52 replicates. (middle) Correlation derivedfrom tree model using the same ensemble as in the first row. (bottom) True correlation from an ensemble with 6240replicates.

TABLE 3. Tree inputs for the example.

Tree property Symbol Coarse-grid value Fine-grid value

Cells in x direction 128 1024Cells in y direction 64 512Total cells 8192 524 288Scales M � 1 6 9Children at coarsest scale q0 2 2Children at all other scales q 4 4Finest-scale nodes q0(q)M�1 32 � 16 � 512 256 � 128 � 32 768Cells at each finest-scale node 16 16Tree states at each finest-scale node dM 32 32Tree states at each coarser-scale node d 11 16Total finest-scale states n � dMq0(q)M�1 16 384 1 048 576Neighborhood size (cells) h 81 81


Fig 3 live 4/C

correlations between velocities at cell (8, 8) (in the up-per-left corner of the domain) and all other cells at timet � 0.84. Correlations for the u component are shown inthe left column while correlations for the � componentare shown in the right column. The three rows of eachfigure show the small-sample forecast, small-sampletree, and large-sample forecast covariances, respec-tively. Figure 4 shows similar plots for velocity correla-tions between cell (88, 56) (in the lower right corner ofthe domain) and all other cells.

These figures indicate that the small-sample tree co-variances resemble the small-sample forecast covari-ances but are somewhat smoother and less affected bysampling anomalies, reflecting the benefits of the treemodel’s localization properties. Both of the small-sample estimates differ significantly from the large-sample “truth” for the (8, 8) case, which is more af-fected by local conditions near the jet inlet and barrier.The small-sample approximations are better for the(88, 56) case.

The correlation plots clearly show the superpositionof small and large features in this problem. Velocityfluctuations at (8, 8) are generally small-scale and haverelatively little correlation with velocities in the rest ofthe domain. On the other hand, velocity fluctuations at

(88, 56) are correlated over longer distances. Note thatthe u correlations at this point extend primarily in thelongitudinal direction, while the � correlations are pri-marily transverse. The structure of these correlationschanges over time, with the regions of longer correla-tion generally moving from left to right. Evidence of theblocking pattern used to define the tree’s finest-scalestates is apparent in the u correlation plot for (88, 56)where the contour lines change abruptly. Such artifactsare relatively easy to avoid if the grid blocks are al-lowed to overlap (Irving et al. 1997).

c. Data assimilation

To test the data assimilation capabilities of the mul-tiscale ensemble filter we derive updated velocity rep-licates at all the nodes on the fine 1024 � 512 Gerrisinput–output grid. The ensemble size is 52, which istypical for operational problems of this size. The repli-cates are updated with synthetic longitudinal velocitymeasurements taken at 2048 equally spaced fine gridlocations. The synthetic measurements are generatedfrom the following measurement equation:

y�t� � Hx�t� � e�t�; t � 0.21, 0.42, 0.63, and 0.84,

�50�

FIG. 4. Forecast velocity (u and �) correlation coefficients between cell (88, 56) and all other cells.Arrangement is the same as in Fig. 3.


Fig 4 live 4/C

where x(t) is a single random replicate of the 1 048 576dimensional state vector (defined as the “true” state forevaluations of estimation error), et is a 2048 dimen-sional vector of independent zero mean normally dis-tributed random measurement errors with standard de-viation 1.5, and H is a 2048 � 1 048 576 dimensionalmeasurement matrix that selects longitudinal velocitiesat the measurement locations.

The temporal behavior of the ensemble multiscalefilter is illustrated in Fig. 5, which shows longitudinalvelocity results at locations (276, 472) and (440, 296),respectively. The true state (red), representative indi-vidual replicates (light blue), and the ensemble mean(dark blue) are plotted over the simulation period t ∈[0, 0.84]. Since (276, 472) is at a measurement locationthe ensemble mean provides a reasonably good esti-mate of the true state. The ensemble mean estimate at(440, 296), which is not at a measurement location, isnearly as good.

Figures 6 and 7 show longitudinal and transverse ve-locity spatial distributions. The top halves of these fig-ures compare the velocity ensemble means before andafter the four update times (first and second columns)to the corresponding true values (third column). Thecomplexity and multiscale character of the true velocityare apparent. Note that the forecast means tend to bemore symmetric, especially at early times, reflecting thefact that the forecast replicates are generated by ran-domly located jets distributed symmetrically around thecenter of the left boundary. The true state is asymmet-ric because it is generated by a single random jet thathappens to be located more often below the center ofthis boundary. The observations used to derive the up-dated velocity replicates are able to capture this asym-metry and the ensemble mean gives a reasonably goodportrayal of the true velocity field, especially just afterthe update.

The lower halves of Figs. 6 and 7 show the ensemblestandard deviations before (first column) and after

(second column) the update times. Note the evidenceof the measurement grid and the substantially reducedlevels of uncertainty in the updated standard devia-tions. The updated longitudinal velocity standard de-viations generally improve more than the transversevelocity standard deviations. This is reasonable, consid-ering that the transverse velocity estimates are inferredindirectly from noisy measurements of the longitudinalvelocity.

Uncertain flow features generated at the jet are ableto propagate throughout much of the domain betweenmeasurements. Consequently, the benefits of each up-date are mostly lost by the time of the next update. Thissuggests that it would be helpful to take measurementsmore often. Some tree artifacts are visible in the formof sharp vertical or horizontal contours at grid blocklocations.

Figure 8 shows the time history of the root-mean-squared error between the ensemble mean and theknown true velocities, taken over all grid cells. Theeffects of the four measurement updates are apparentnot only in the longitudinal velocity plot but also in thetransverse plot, although only longitudinal velocitiesare measured. The longitudinal velocity error nearlyrises to the previous high before falling at each newupdate time.

The abrupt state updates observed in Kalman filter-ing generally have the effect of changing the values ofconserved quantities such as mass, momentum, and en-ergy. These changes are to be expected if input infor-mation is uncertain and forecasts are incompatible withobservations. In our application the only input uncer-tainty is the position of the inlet jet. As a result, thespatially integrated mass over the domain is conserved(within a few percent), even through the updates. Thespatially integrated momentum and kinetic energy val-ues (not shown here) generally increase after updates.This appears to reflect the impacts of 1) inlet positionuncertainty, which influences the location and move-

FIG. 5. Filter replicates, ensemble mean, and true velocity u time series at (left) cell (276, 472) and (right) cell(440, 296).


Fig 5 live 4/C

FIG. 6. (top half) Ensemble mean of u before and after update at measurement times, and the corresponding true values. (lowerhalf) Ensemble standard deviation of u before and after update.


Fig 6 live 4/C

FIG. 7. (top half) Ensemble mean of � before and after update at measurement times, and the corresponding true values. (lowerhalf) Ensemble standard deviation of � before and after update.


Fig 7 live 4/C

ment of simulated vortices; and 2) dissipation due to thelack of subgrid resolution in the forward simulation. Newmeasurements at the update time generally reveal morevelocity variability than was forecast, resulting in anincrease in spatially integrated momentum and energy.

Although abrupt filter updates may correct for modelerrors, as they have in our example, they can have ad-verse impacts on forecasts, especially in meteorologicalapplications (Cohn et al. 1998; Mitchell et al. 2002; Lorenc2003). For this reason, considerable effort has been de-voted to the problem of preserving dynamic balanceduring filter updates. The multiscale approach providesan opportunity to reassess the balance problem from theperspective of scale (rather than spatial) localization. Itmay be possible to adjust the tree structure to preservevarious balance measures, much as model error pertur-bations and covariances have been adjusted for thispurpose in other applications (Mitchell et al. 2002).

Overall, the ensemble multiscale filter gives resultssimilar to a comparable ensemble Kalman filter, butwith less computational effort and without the need forspatial localization. More extensive tests are needed toobtain a definitive assessment of the multiscale filter’scapabilities. It is worth noting that there are very fewensemble methods that can be run in a reasonableamount of time for problems of this size without spatiallocalization. In fact, we were unable to get a traditionalensemble filter to run in a reasonable time on our mul-tiprocessor cluster for the 1 048 576 state example de-scribed here. This makes it difficult to compare multi-scale and traditional filter performance for the verylarge problems for which the multiscale approach ismost attractive. Fruitful comparisons could be madewith spatially localized ensemble filters that are com-putationally competitive for such problems.

6. Discussion and conclusions

The ensemble multiscale Kalman filter relies on thesame basic structure as the classical ensemble Kalman

filter. It uses a nonlinear model to propagate replicatesof the system states between measurement times anduses implicit Gaussian assumptions to update these rep-licates when measurements become available. The dif-ference in the two approaches is in the way the mea-surement update is implemented. The classical updaterelies on Kalman gains obtained from large low-rankglobal sample covariance matrices. These matrices areestimated from the forecast ensemble propagated fromthe previous update time. The multiscale approach ef-fectively replaces the sample covariances with a treethat is also estimated from the forecast ensemble. How-ever, the multiscale tree relates system variables at dif-ferent locations through a set of local parent–child re-lationships rather than spatial correlations. This localtree-based description of spatial structure makes it pos-sible to carry out the Kalman update in much less timethan is required by the traditional covariance-based ap-proach.

The tree-based measurement update requires theforecast replicates to be assigned to particular finest-scale tree nodes and the measurements to be assignedto tree nodes at various scales, depending on the mea-surement support. Once these assignments are carriedout the tree is identified directly from the finest-scaleensemble, using a predictive efficiency approach. Theresulting tree generally only approximates the sampleforecast covariances used in the classical ensemble fil-ter. This is because 1) the states of the coarser treenodes are truncated, 2) pairwise correlations are usedto simplify the predictive efficiency computations, 3)the V(s) matrices used in the predictive efficiency pro-cedure are assumed to be block diagonal, and 4) pre-dictive efficiency correlations are derived only over ascale-dependent neighborhood around each node. Allof these simplifications are introduced to improve com-putational efficiency.

The truncation and neighborhood screening approxi-mations provide many different options for filtering outsampling error. Since these approximations affect rela-tionships between nodes at different scales they providea type of localization in scale (rather than in space).Scale-based localization affects the tree’s approxima-tion of spatial correlations indirectly (rather than di-rectly) through its influence on the common ancestorsof the finest-scale nodes.

The example presented in this paper illustrates someof the capabilities of the multiscale approach. In par-ticular, it shows that it is possible to obtain reasonablestate estimates for a large nonlinear data assimilationproblem with changing time and space scales, withoutany spatial localization. In our example scale localiza-

FIG. 8. RMSE of the ensemble mean of difference between uand � and the corresponding true velocity (averaged over theentire domain).


Fig 8 live 4/C

tion provides some modest filtering of high-frequencysampling errors but it does not have much effect onlarger-scale errors. Overall, the spatial covariance re-constructed from the tree (for diagnostic purposes)looks much like the sample forecast covariance esti-mated from the ensemble. Both exhibit the same large-scale deviations from the true forecast covariance.

Since the study described here focused primarily onimplementation of the ensemble multiscale Kalman fil-ter for a large problem, no effort was made to optimizethe scale-localization procedure. Preliminary experi-ments suggest that it may be difficult to obtain signifi-cant suppression of large-scale sampling errors withinthe confines of the particular tree-identification frame-work presented here. However, it may be possible to dobetter if the coarser-scale node dimension d and neigh-borhood h are allowed to vary over time, rather thanrequired to stay fixed at specified values. Also, it wouldbe relatively easy to combine spatial localization withscale localization to deal with sampling errors over awider range of conditions than either approach canhandle individually.

The propagation step of the ensemble Kalman filteris well known to be amenable to parallelization over theensemble. However, the sample covariance estimationportion of the traditional ensemble Kalman update re-quires merging of information from all replicates and isnot inherently parallel. By contrast, the multiscale tree-identification procedure provides a number of optionsfor parallel computing. For example, the predictive ef-ficiency calculations can be carried out in parallelacross all nodes on a given scale. Measurement updatesin different tree branches can also be parallelized.These efficiencies are not included in our computa-tional complexity discussion but could be quite benefi-cial for large problems.

Overall, the ensemble multiscale Kalman filter ap-pears to offer considerable advantages for large non-linear data assimilation applications. Here the filter’sperformance is demonstrated for one particular prob-lem. Other problems may respond somewhat differ-ently to the approximations introduced in the tree-identification procedure. However, our computationalcomplexity analysis can be expected to apply in general.This analysis clearly shows the superiority of the mul-tiscale approach over the classical nonlocalized en-semble Kalman filter. It remains to be seen how themultiscale approach compares to spatial localizationprocedures that also provide computational benefitsover the classical approach. A complete assessment willrequire an examination of both accuracy and computa-tional effort for a range of problems.

REFERENCES

Arulampalam, M. S., S. Maskell, N. Gordon, and T. Clapp, 2002:A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Processing,50, 174–188.

Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysisscheme in the ensemble Kalman filter. Mon. Wea. Rev., 126,1719–1724.

Cohn, S. E., A. da Silva, J. Guo, M. Sienkiewicz, and D. Lamich,1998: Assessing the effects of data selection with the DAOphysical-space statistical analysis system. Mon. Wea. Rev.,126, 2913–2926.

Evensen, G., 1994: Sequential data assimilation with a nonlinearquasi-geostrophic model using Monte Carlo methods toforecast error statistics. J. Geophys. Res., 99, 10 143–10 162.

——, 2003: The ensemble Kalman filter: Theoretical formulationand practical implementation. Ocean Dyn., 53, 343–367.

Frakt, A. B., and A. S. Willsky, 2001: Computationally efficientstochastic realization for internal multiscale autoregressivemodels. Multidimens. Syst. Signal Processing, 12, 109–142.

Gelb, A., Ed., 1974: Applied Optimal Estimation. MIT Press, 374pp.

Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novelapproach to nonlinear/non-Gaussian Bayesian state estima-tion. IEE Proc. F Radar Signal Process., 140, 107–113.

Hamill, T. M., J. S. Whitaker, and C. Synder, 2001: Distance-dependent filtering of background error covariance estimatesin an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790.

Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential en-semble Kalman filter for atmospheric data assimilation. Mon.Wea. Rev., 129, 123–137.

Irving, W. W., P. W. Fieguth, and A. S. Willsky, 1997: Anoverlapping tree approach to multiscale stochastic model-ing and estimation. IEEE Trans. Image Processing, 6, 1517–1529.

Jazwinsky, A. H., 1970: Stochastic Processes and Filtering Theory.Academic Press, 376 pp.

Keppenne, C. L., and M. M. Rienecker, 2002: Initial testing of amassively parallel ensemble Kalman filter with the Poseidonisopycnal ocean general circulation model. Mon. Wea. Rev.,130, 2951–2965.

Lorenc, A. C., 2003: The potential of the ensemble Kalman filterfor NWP: A comparison with 4D-Var. Quart. J. Roy. Meteor.Soc., 129, 3183–3203.

Margulis, S. A., D. McLaughlin, D. Entekhabi, and S. Dunne,2002: Land data assimilation and estimation of soil moistureusing measurements from the Southern Great Plains 1997Field Experiment. Water Resour. Res., 38, 1299, doi:10.1029/2001WR001114.

McLaughlin, D., 2007: A probabilistic perspective on nonlinearmodel inversion and data assimilation. Subsurface Hydrol-ogy: Data Integration for Properties and Processes, Geophys.Monogr., Vol. 171, Amer. Geophys. Union, 243–253.

Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002:Ensemble size, balance, and model-error representation inan ensemble Kalman filter. Mon. Wea. Rev., 130, 2791–2808.

Ott, E., and Coauthors, 2004: A local ensemble Kalman filter foratmospheric data assimilation. Tellus, 56A, 415–428.

Popinet, S., 2003: Gerris: A tree-based adaptive solver for the


incompressible Euler equations in complex geometries. J.Comput. Phys., 190, 572–600.

Reichle, R. H., and R. D. Koster, 2005: Global assimilation ofsatellite surface soil moisture retrievals into the NASACatchment land surface model. Geophys. Res. Lett., 32,L02404, doi:10.1029/2004GL021700.

——, D. B. McLaughlin, and D. Entekhabi, 2002: Hydrologic dataassimilation with the ensemble Kalman filter. Mon. Wea.Rev., 130, 103–114.

Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and

J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea.Rev., 131, 1485–1490.

Willsky, A. S., 2002: Multiresolution Markov models for signaland image processing. IEEE Proc., 90, 1396–1458.

Zhou, Y., 2006: Multi-sensor large scale land surface data assimi-lation using ensemble approaches. Ph.D. thesis, Massachu-setts Institute of Technology, 234 pp.

——, D. McLaughlin, D. Entekhabi, and V. Chatdarong, 2006:Assessing the performance of the ensemble Kalman filter forland surface data assimilation. Mon. Wea. Rev., 134, 2128–2142.


Documents

An Ensemble Multiscale Filter for Large Nonlinear Data ...web.mit.edu/dennism/www/Publications/M49_2008_Zhou... · An Ensemble Multiscale Filter for Large Nonlinear Data Assimilation