Advances in Water Resources - MITweb.mit.edu/~hamed_al/www/Alemohammad_Ensemble...R. Wójcik et al./Advances in Water Resources 70 (2014) 36–50 37 prior density can be speciﬁed

Advances in Water Resources 70 (2014) 36–50

Contents lists available at ScienceDirect

Advances in Water Resources

journal homepage: www.elsevier .com/ locate/advwatres

Ensemble-based characterization of uncertain environmental features

http://dx.doi.org/10.1016/j.advwatres.2014.04.0050309-1708/� 2014 Elsevier Ltd. All rights reserved.

⇑ Corresponding author at: Department of Civil and Environmental Engineering,Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

E-mail addresses: [email protected] (R. Wójcik), [email protected](D. McLaughlin), [email protected] (S.H. Alemohammad), [email protected](D. Entekhabi).

Rafał Wójcik a,b,⇑, Dennis McLaughlin a, Seyed Hamed Alemohammad a, Dara Entekhabi a

a Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USAb AIR Worldwide, Financial and Uncertainty Modeling, 3 Copley Place, Boston, MA 02116, USA

a r t i c l e i n f o

Article history:Received 31 October 2013Received in revised form 27 February 2014Accepted 1 April 2014Available online 19 April 2014

Keywords:Ensemble estimationImage fusionImportance samplingDimensionality reductionData assimilationPrecipitation

a b s t r a c t

This paper considers the characterization of uncertain spatial features that cannot be observed directlybut must be inferred from noisy measurements. Examples of interest in environmental applicationsinclude rainfall patterns, solute plumes, and geological features. We formulate the characterization pro-cess as a Bayesian sampling problem and solve it with a non-parametric version of importance sampling.All images are concisely described with a small number of image attributes. These are derived from amultidimensional scaling procedure that maps high dimensional vectors of image pixel values to muchlower dimensional vectors of attribute values. The importance sampling procedure is carried out entirelyin terms of attribute values. Posterior attribute probabilities are derived from non-parametric estimatesof the attribute likelihood and proposal density. The likelihood is inferred from an archive of noisy oper-ational images that are paired with more accurate ground truth images. Proposal samples are generatedfrom a non-stationary multi-point statistical algorithm that uses training images to convey distinctivefeature characteristics. To illustrate concepts we carry out a virtual experiment that identifies rainy areason the Earth’s surface from either one or two remote sensing measurements. The two sensor case illus-trates the method’s ability to merge measurements with different error properties. In both cases, theimportance sampling procedure is able to identify the proposals that most closely resemble a specifiedtrue image.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

In many fields there is a need to characterize uncertain spatialfeatures that cannot be observed directly. Examples of particularinterest in environmental applications include characterization ofsurface rainfall in ungaged areas [59], tracking of subsurface soluteplumes, and identification of geological features such as ground-water aquifers, mineral deposits, and oil reservoirs [6,14,26].Remote sensing measurements such as passive and active satellitemicrowave, geodetic, or seismic observations can often provideuseful but imperfect information about uncertain features. The sta-tistical attributes of these noisy measurements can be estimatedby comparing them to more accurate but less readily available‘‘ground truth’’ measurements. For example, ground-basedweather radar and rain gage data can serve as ground truth foran evaluation of errors in satellite microwave measurements thathave greater coverage but may be less accurate. In subsurface flow

borehole measurements can serve as local ground truth for seismicor other remotely sensed data.

The diverse measurements collected in a typical environmentalassessment can be conveniently combined in a Bayesian frame-work that considers all available sources of information. In a fea-ture-based application images are characterized by vectors ofappropriate variables (e.g. pixel values or feature attributes). TheBayesian approach conditions a prior distribution of the uncertaintrue image/vector on a set of noisy measured images/vectors,yielding a posterior distribution that characterizes the uncertaintyremaining after the measurements are taken into account.

Many researchers have investigated methods for using noisymeasurements to characterize complex environmental features.One option is to identify a point estimate (a single image) that opti-mizes an appropriate deterministic performance objective (e.g. aleast-squares measure of misfit between the estimate and a mea-sured image). These methods can be viewed as Bayesian a posterioriestimators (i.e. estimators of the posterior mode) if Gaussianassumptions are adopted. Feature-based optimization methodsoften use level-set techniques to characterize irregular and/or dis-connected feature boundaries [3,32]. Level set optimization meth-ods are flexible and popular in the image processing communitybut they do not generate posterior distributions of uncertain

http://crossmark.crossref.org/dialog/?doi=10.1016/j.advwatres.2014.04.005&domain=pdf

http://dx.doi.org/10.1016/j.advwatres.2014.04.005

mailto:[email protected]





http://www.sciencedirect.com/science/journal/03091708

http://www.elsevier.com/locate/advwatres

R. Wójcik et al. / Advances in Water Resources 70 (2014) 36–50 37

feature variables. Posterior distributions are needed for probabilisticapplications such as ensemble forecasting, risk assessment, extremevalue analysis, and stochastic control.

Ensemble methods such as Markov Chain Monte Carlo (MCMC)and importance sampling provide a probabilistic characterizationthat yields samples from the Bayesian posterior [1,7,17,44]. How-ever, these methods often rely on parametric prior and measure-ment error distributions (e.g. the multivariate Gaussiandistribution) and are generally not applied to feature characteriza-tion problems. This reflects the fact that very large sample sizesare needed to properly characterize the non-parametric multivari-ate probability densities of high-dimensional image vectors[12,65]. Spatial features such as geological facies, solute plumes,and rain storms are usually difficult to describe with parametricprobability models. The multipoint statistical approach used inmany geological applications [53,57,58] provides an alternative thatgenerates realistic non-parametric prior or proposal samples fromtraining images. However, multipoint methods are generally unableto sample from the Bayesian posterior distribution.

An ideal approach for characterizing uncertain features wouldcombine the level set method’s ability to handle complex geome-tries, the general Bayesian probabilistic framework provided byMCMC and importance sampling, and the non-parametric samplesgenerated by multi-point statistical techniques. This paperdescribes an approximate method for integrating these differentcapabilities in a single characterization procedure. It adopts a non-parametric importance sampling approach with proposal samplesgenerated from training images and measurement error samplesgenerated from archived data. The computational limitations ofensemble sampling are partially circumvented by representing eachimage with a small vector of feature attributes that provides a moreconcise and efficient description than a classical pixel-baseddescription. Since it is difficult to specify in advance a set of universalattributes that adequately characterize complex features we use amultidimensional scaling technique to derive the attributes directlyfrom image pixel values.

In the following sections we first formulate an attribute-basedapproach to the feature characterization problem, using importancesampling to generate approximate probability-weighted samplesfrom the Bayesian posterior. We then consider how multidimen-sional scaling can identify image attributes from pixel-based imagevectors and how the priors and likelihoods used in importance sam-pling can be generated from archived attribute data. The generalconcepts are illustrated with a virtual experiment based on real rain-fall measurements. The paper concludes with a discussion of con-ceptual and computational issues that identify directions forfuture research.

2. Image based importance sampling

Importance sampling is a procedure that generates samples froma posterior probability distribution that is related through Bayestheorem to a prior distribution and a likelihood function. Followingthe approach outlined above, we formulate the feature-basedimportance sampling problem in terms of a small number of distinc-tive image attributes. In particular, we define a random vector xcomposed of true image attributes. The true image is observed byone or more sensors that produce noisy images, which we call cur-rent operational measurements to distinguish them from thearchived operational measurements discussed below. The currentmeasurement from sensor r is described by an attribute vector zr .Attributes for all of the measurements are assembled in the globalmeasurement vector z. Section 3.4 describes how the image attri-butes are derived.

The objective of importance sampling is to generate a set ofsamples x1; x2; . . . xNx of x from the posterior probability densitypXjZðxjzÞ. This density can be expressed, through Bayes theorem, as:

pXjZðxjzÞ ¼ c pZjXðzjxÞpXðxÞ ð1Þ

where pZjXðzjxÞ is the likelihood function, pXðxÞ is a prior probabilitydensity that does not depend on the measurements, and c is a pro-portionality constant selected to insure that the posterior densityintegrates to unity. The prior density quantifies our prior uncer-tainty about the true image while the likelihood function describesthe effects of measurement error. Since it is difficult to sampledirectly from the posterior density we work instead with a moreconvenient set of equally likely random samples from a proposaldensity qðxÞ. Unlike the prior, the proposal density can depend onthe current measurements. We use the multipoint statistical tech-niques described below to generate proposal samples. An approxi-mate posterior probability distribution can be derived byappropriate weighting of these proposal samples. It is convenientto illustrate the process by considering the following equivalentexpressions for the conditional expectation of x over pXjZðxjzÞ (fora given z):

Ep½xjz� ¼Z

x pXjZðxjzÞdx ¼Z

qðxÞ xpXjZðxjzÞ

qðxÞ

" #dx

¼ Eq xpXjZðxjzÞ

qðxÞ

" #ð2Þ

This mean can be estimated from the Nx proposal samples:

Eq xpXjZðxjzÞ

qðxÞ

" #� c

Nx

XNx

i¼1

xi

pZjXðzjxiÞpXðxiÞqðxiÞ

¼Z

xc

Nx

XNx

i¼1


dðx� xiÞ" #

dx ð3Þ

where xi is sampled from qðxÞ. The final bracketed expression in (3)is a discrete approximation to the posterior density pXjZðxjzÞ thatappears in the second term in (2). This expression can be re-writtenas:

pXjZðxjzÞ �XNx

i¼1

wi dðx� xiÞ ð4Þ

where:

wi ¼c

Nx


ð5Þ

and c is selected such thatP

iwi ¼ 1. Eq. (4) tells us that the pro-posal image attribute vectors x1; x2; . . . ; xNx can be interpreted assamples from the posterior density if they are assigned the weightsw1; w2; . . . ; wNx computed in (5) rather than the equal weights thatapply when they are treated as samples from the proposal density.So the proposal and posterior samples have the same values but dif-ferent probabilities. Note that, in the special case where the pro-posal density is the same as the prior the pXðxiÞand qðxiÞ terms in(5) cancel and the weights only depend on the likelihood. However,it is usually best to draw proposal samples from a distribution thatdepends on the current measurement rather than from the priordistribution, which does not.

3. Generating the information needed for importance sampling

The importance sampling approach outlined in Section 2 needsa set of proposals x1; x2; . . . ; xNx , the likelihood function pXjZðxjzÞ,and the prior pXðxÞ and proposal qðxÞ probability densities. The

38 R. Wójcik et al. / Advances in Water Resources 70 (2014) 36–50

prior density can be specified directly in terms of attributes since itdoes not depend on measurements. The process for generating pro-posals and for obtaining the likelihood and proposal density func-tions involves several steps that start with high-dimensional pixel-based image vectors and concludes with the low-dimensionalattribute-based importance weight calculations of (5). The overallflow of information is summarized in Fig. 1 and the steps aredescribed in more detail in the following subsections.

3.1. Data sources

It is essential to rely on low-dimensional attribute vectors in animage-based importance sampling procedure. But it is more conve-nient to work with high-dimensional vectors of pixel values whengenerating proposals and compiling the information needed toconstruct attribute-based probability density functions. To clarifythe process it is convenient to define the various images and asso-ciated pixel and attribute vectors used in the importance samplingprocedure:

1. True image: One image described by a vector xT with Np pixelvalues. The true image is unknown but assumed to be ade-quately approximated by one or more of the proposals.

2. Current operational measurements: Nz noisy images derived fromNz different operational sensors that all observe the same trueimage. These sensors produce measurements that are lowerquality (lower resolution, less accurate, etc.) than the groundtruth measurements (see below) but are more readily available,more convenient, or with better coverage. The current opera-tional measurements are described by the vectorsz1; z2; . . . ; zNz , each with Np pixel values, or by smaller vectorsz1; z2; . . . ; zNz , each with Na attribute values. The pixel-basedcurrent measurement vectors for all of the r ¼ 1; . . . ;Nz sensorsare assembled in the NpNz-dimensional composite measure-ment vector z while the corresponding attribute-based vectorsare assembled in the NaNz-dimensional composite measure-ment vector z.

3. Proposals: Nx images described by the vectors x1; x2; . . . ; xNx ,each with Np pixel values, or by Nx smaller vectorsx1; x2; . . . ; xNx , each with Na attribute values. These proposalsare candidate images that are ranked by the importance sam-pling procedure according to the probability that they corre-spond to the true image.

Fig. 1. Information flow for feature characterization with importance sampling.

4. Archived operational measurements: Nz sets of noisy images.Each set contains Nu images derived from a particular sensor.The images for sensor r (where r ¼ 1; . . . ;Nz) are described byvectors u1;r ;u2;r ; . . . ;uNu ;r , each with Np pixel values, or by smal-ler vectors u1;r ; u2;r ; . . . ; uNu;r , each with Na attribute values. All ofthe uj;r vectors are assembled in the NpNuNz-dimensional com-posite measurement vector u while all of the uj;r vectors areassembled in the NaNuNz-dimensional composite measurementvector u.

5. Archived ground truth measurements: Nu images derived from areference sensor with smaller errors than the operational sen-sors. These images are described by the vectors v1; v2; . . . ;vNu ,each with Np pixel values, or by Nu smaller vectorsv1; v2; . . . ; vNu , each with Na attribute values. All of the v j vec-tors are assembled in the NpNu-dimensional composite mea-surement vector v while all of the v j vectors are assembled inthe NaNu-dimensional composite measurement vector v .

The archived images from a given operational sensor are pairedwith corresponding ground truth images from the reference sensor(each pair has a particular subscript j). These paired measurementsmay be used to estimate measurement error probability densities,as discussed in Section 3.3. This process of estimating error proper-ties from archived measurements is not essential to the impor-tance sampling approach but, if feasible, provides a convenientway to develop non-parametric probabilistic measurement errormodels.

3.2. Proposal generation

Importance sampling works much better in feature character-ization applications if the proposals it uses are realistic images thatshare important features with the unknown true image. At thesame time, they must be sufficiently different from one anotherto properly represent uncertainty. It is possible to balance theserequirements by generating conditional realizations from anappropriate training image (or images).

There are a number of ways to generate random but structuredimages from training images. One such procedure is the multipointstatistical generator [28] provided in the SNESIM package[53,57,58] used in our example. When archived ground truth mea-surements are available it is convenient to use them as proposaltraining images since they reveal a range of features observed inthe past. In other situations where such archival measurementsare not available it may be sufficient to generate a training imagemanually, for example by sampling and transforming featuresobserved in the current measurement image. Fig. 2(c) shows sometypical unconditional realizations generated from SNESIM usingthe archived ground truth measurement of Fig. 2(a) as a trainingimage. This figure relies on some of the images from our rainfallexample, which adopts a categorical (binary) description of inten-sity (i.e. red indicates rain and blue indicates no rain). Similar cat-egorical characterizations are convenient in subsurfaceapplications, where different categories represent different geolog-ical facies. The multi-point approach is sufficiently general to han-dle training images with continuously varying intensities, althoughrealization generation in such cases is usually more computation-ally demanding. In the example, note that the red areas in the real-izations are approximately the size and shape of the red areas inthe training image but they can occur anywhere in the imagedomain.

We can modify unconditional realizations to obtain non-sta-tionary behavior if all realizations are constrained to reproduceintensity values specified in particular pixels. These could be val-ues from a current measurement image, a training image, or someother source. The resulting non-stationary image has different

Fig. 2. Unconditional and conditional realizations generation procedure: (a) training image; (b) current measurement and 1% random constraining pixels (black dots); (c)unconditional realizations simulated using the training image in (a); (d) conditional realizations simulated using the training image in (a) and random constraining pixels in(b).


ensemble probabilities in different locations. This is illustrated inFig. 2(d), where the realizations are still derived from the trainingimage of Fig. 2(a) but also constrained by intensity values at spec-ified pixels (black dots) in the current operational measurement ofFig. 2(b). The constraining pixels are distributed randomly over thedomain, constituting 1% of the total number of pixels. The red andblue areas in the resulting realizations tend to occur in the generalvicinity of the corresponding areas in the current measurementimage. There is still variability among realizations but not as muchas in the unconditional case. The SNESIM multipoint image gener-ation procedure provides a convenient way to construct proposalsfor importance sampling. We can generate a range of proposals,some that look very much like the measurement and some thatlook much different, by varying the fraction of pixels used to con-strain the unconditional realizations. This is illustrated in Fig. 3,again using images from the rainfall example introduced in Sec-tion 3.1. Here the fraction of constraining points varied from 1%to 25%. The resulting ensemble in Fig. 3(c) provides a diverse setof proposals, some close to the current measurement and othersthat are quite different. The true image is shown in Fig. 3(a) forcomparison.

There are many variants on the proposal generation procedureoutlined above. The SNESIM algorithm and the conditioningapproach described above are adequate for the applications con-sidered here. In other applications with rather different featuresor continuous intensities other methods may be appropriate. The

important point is that we can generate ‘‘realistic’’ proposal imagesin a manner compatible with our application, without ever specify-ing an explicit probability model for the high-dimensional pixelvectors used to characterize these images. This approach providesconsiderable flexibility and makes it more likely that the impor-tance sampling proposals will be similar to the unknown trueimage.

Once the proposals are generated as high-dimensional vectorsof pixel values they can be transformed through the multi-dimen-sional scaling procedure into low-dimensional vectors of imageattributes suitable for importance sampling. Then, kernel densitymethods [50,56,61] or mixture models [22,35,55] can be used tofit a continuous Na-dimensional multivariate probability densityqðxÞ to the proposal attribute vectors x1; x2; . . . ; xNx . The posteriorimportance sampling weights in (5) depend on the values of thisdensity obtained at the proposal attribute vectors. Note that a pro-posal density is needed only in the low-dimensional attributespace, not in the pixel space.

3.3. Measurement errors

In Bayesian sampling applications likelihood functions are typ-ically derived from models of the measurement error process. Themost common model is to assume that measurements are the sumof a known function of the true (hidden) variable and a random

Fig. 3. (a) True image. (b) Current operational measurement. (c) Examples of conditional proposal realizations xi (12 realizations for each fraction of constraining pixels). Thefractions of constraining pixels are displayed in bold in the left panel. Each image in (a)–(c) is 16 � 16 pixels.


noise term. When expressed in terms of image attributes thismodel may be written:

z ¼ Hxþ � ð6Þ

where H is an NzNa � Na matrix that relates the measurement andtrue image attributes and � is an NzNa-dimensional random errorvector described by the multivariate probability density pE ð�Þ. Inthe feature characterization problems considered here the measure-ment operator is linear but the general approach also accommo-dates nonlinear operators.

Eq. (6) implies that the attribute-based likelihood for any givenproposed true value x ¼ xi may be written as:

pZjXðzjxiÞ ¼ pE ðz� HxiÞ ð7Þ

Although an additive error can always be defined as the differencebetween a noisy measurement and an accurate ground truth image,an additive measurement model is useful only if the errors do notdepend on the true image. In this case, the error statistics can bespecified without knowing the true feature. We use the commonlyadopted additive error assumption here because measurementattribute errors in our rainfall example do, indeed, appear to beindependent of the true value, based on comparisons of the archivaloperational and ground truth attribute values. In other applicationsmeasurement attribute errors may not be additive and different

measurement error models should be considered. In any case, theseerror models will rely on the attribute vectors obtained fromarchived operational and ground truth measurements. If available,the archived operational and ground truth measurements men-tioned in Section 3.1 provide a good basis for constructing a realisticnon-parametric measurement error density. Fig. 4 shows some typ-ical pairs of archived ground truth measurements (Fig. 4(a)) andcorresponding archived operational measurements (Fig. 4(b)) takenfrom our rainfall example. The difference images obtained by sub-tracting each ground truth measurement from its operational coun-terpart are shown in Fig. 4(c). Corresponding images are located inthe same positions in the figure.

The operational measurements can have different numbers ofdistinct features (identified by the red areas) than the ground truthmeasurements, with different sizes and locations. Nevertheless,they clearly resemble their ground truth counterparts. The mea-surement differences depend on the true image, contrary to theindependent additive error assumption typically made in Bayesianstatistics. For example, note that the top left image, where the redareas in both measurements lie in the bottom half of the frame. Thedifferences for this image are also confined to the bottom half, indi-cating a correlation with the true image. As mentioned above, suchcorrelations sometimes decrease or even disappear when theimages are described in terms of attribute rather than pixel values.

Fig. 4. (a) Examples of NEXRAD-IV rainfall signatures used as ground-truth images in this study. Red indicates rain and blue no rain. The dimension of each image is 16 � 16pixels. (b) Examples of synthetic historical measurements generated using conditional simulation described in Section 3.2. Each synthetic measurement corresponds to theground-truth image in (a). So for example the image in upper left panel corresponds to the image in upper left panel in (a). (c) The difference images obtained by subtractingeach ground truth image in (a) from its operational counterpart in (b). Blue color represents areas with no error, green represents false alarms (measurement is rainy but notthe ground-truth) and yellow represents misses (ground-truth is rainy but not the measurement). The black line shows the boundary of the ground-truth rain. (Forinterpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)


We can estimate the attribute measurement error probabilitydensity from paired vectors of archived operational and groundtruth image attributes (e.g. from attribute vectors derived from

the images shown in Fig. 4(a) and (b)). When the attribute mea-surement errors are additive the differences uj;r � v j between thearchived attribute vectors for sensor r can be treated as random


samples from the population of actual measurement attributeerrors described by pE ð�Þ. Kernel density or mixture models[22,49,50,56] can be used to fit a continuous Na-dimensional prob-ability density pE ð�Þ to the uj;r � v j samples. We can then combine(5) and (7) to obtain an importance weight expression thatdepends directly on pE:

wi ¼c

Nx

pE ðz� HxiÞpXðxiÞqðxiÞ

ð8Þ

When there is good reason to believe that the measurement errorsfrom different sensors are independent pE factors into the productof Nz densities, one corresponding to each sensor:

pE ð�Þ ¼ pE1ð�1ÞpE2

ð�2Þ . . . p ^ENzð�Nz Þ ð9Þ

Each of these individual error densities can be estimated separatelyfrom the operational measurement attribute vectors for the corre-sponding sensor. An estimate of pE ð�Þ can then be constructeddirectly from (9), giving the following special case of (8) for inde-pendent sensor errors:

wi ¼c

Nx

pE1ðz1 � H1xiÞpE2

ðz2 � H2xiÞ . . . p ^ENzðzNz � HNz xiÞpXðxiÞ

qðxiÞð10Þ

where pErðzr � HrxiÞ is the likelihood associated with sensor r.

3.4. Data-driven high-dimensional scaling

Sections 3.2 and 3.3 indicate that we can identify the proposaland measurement error terms in the posterior weight expressionof (8) if we can derive the attribute vectors xi; zr � uj;r and v j forthe proposal samples, current measurements, and archived mea-surements, respectively. All of these images are available in apixel-based form, described by the xi; zr ; uj;r and v j vectors. Inorder to obtain the desired attribute vectors we need a transforma-tion that maps any given image’s pixel vector into a correspondingattribute vector.

Each image can be viewed as a point in an Np-dimensional pixelspace, with coordinates equal to the pixel values, or as a corre-sponding point in a much smaller Na-dimensional attribute space,with coordinates equal to the attribute values. The posteriorweight derived for a given image/point in the attribute space alsoapplies to the corresponding image/point in the Np � Na pixelspace. Since there are many pixels but only a few attributes thetransformation from pixel to attribute space will generally yieldonly an approximate representation of the original image. But ifthe attributes are carefully chosen and the pixel-based descriptionis has significant redundancy the approximation may be adequatefor importance sampling purposes.

We formulate the transformation problem by defining compos-ite vectors and that include all of the known pixel and attribute-based vectors defined in Section 3.1:

y¼ x1; . . . ;xNx ;z1; . . . ;zNz ;u1;1; . . . ;uNu ;1;u1;2; . . . ;uNu ;2;u1;Nz ; . . . ;uNu ;Nz ;v1; . . . ;vNu½ �

y¼ x1; . . . ; xNx ; z1; . . . ; zNz ; u1;1; . . . ; uNu ;1; u1;2; . . . ; uNu ;2; u1;Nz ; . . . ; uNu ;Nz ; v1; . . . ; vNu½ �

The pixel-based vector y has dimension Ny ¼ NINp while the attri-bute-based vector y has dimension Ny ¼ NINa , whereNI ¼ Nx þ Nz þ NzNu þ Nu is the total number of images to be trans-formed. We represent the simultaneous transformation of theentire set of NI images from pixel to attribute space by Tð�Þ:

y ¼ TðyÞ ð11Þ

Our task is to find a transformation that retains as much relevantinformation as possible when mapping from pixel to attribute val-

ues. An approach suggested by (8) indicates that the weight wi

given to the proposal xi depends on the difference between thepoints z and Hxi in the attribute space. If the measurement errorprobability density indicates that large errors are unlikely the pro-posal will be given less weight when z and Hxi are further apart. Itis, therefore, reasonable to expect the transformation Tð�Þ to pre-serve, as much as possible, the distances between any twoimages/points in the pixel space when they are mapped to the attri-bute space. That is, points which are close (far apart) in the attributespace should also be close (far apart) in the pixel space.

There are various ways to derive distance-preserving transfor-mations from high to low-dimensional vector spaces. All of theserely on quantitative definitions of the distance between pairs ofpoints in a given space. In general, it is both possible and desirableto use different distance (or similarity) measures in the two spaces.We have found that a Euclidean measure is generally satisfactoryin the low-dimensional attribute space. Classical distance mea-sures, such as the Minkowski, Mahalanobis or Chebyshev [16] donot seem to work as well in the high-dimensional pixel space. Inparticular, such measures have trouble detecting differencesbetween categorical images with scattered multiple features.Other distance measures (or, more generally, similarity indices)that focus on image overlap seem better suited for binary, pixel-based images. One example is the Simple Matching Coefficient(SMC) [18,21]. The SMC distance between two pixel-based binaryimages is:

dSMC ¼ 1� n11 þ n00

n01 þ n10 þ n11 þ n00ð12Þ

where n11 is the number of pixels that have a value of 1 in bothimages, n01 is the number of pixels that have value 0 in the firstand 1 in the second image, n10 is the number of pixels that havevalue 1 in the first and 0 in the second image and n00 is the numberof pixels that have value 0 in both images. The value of this distanceis 0 if the images coincide and 1 (the largest possible value) if theydo not overlap anywhere (i.e. they have no pixels in common). TheSMC distance emphasizes differences in the location and size ratherthan the shape of features found in two images. Two other exam-ples, that are applicable for both binary and non-negative real-val-ued vectors, are the generalized Jaccard and Tanimoto distances[34,45,51]. The generalized Jaccard distance between Np elementvectors X and Y of non-negative pixel values for two images is:

dJac ¼1�

PNpi¼1

minðXi ;YiÞPNpi¼1

maxðXi ;YiÞifXNp

i¼1

maxðXi;YiÞ > 0

0 ifXNp

i¼1

maxðXi;YiÞ ¼ 0

8>>>>><>>>>>:

ð13Þ

The corresponding Tanimoto distance (which is not a metricbecause it does not satisfy the triangle inequality) is:

dTan ¼1� X�Y

X�XþY �Y�X�Y if X � X þ Y � Y � X � Y > 00 if X � X þ Y � Y � X � Y ¼ 0

�ð14Þ

where X � Y represents the dot product between vectors X and Y.These two distances have the same value when the images are bin-ary, as they are in the example of Section 4. Although no distancemeasure is suitable for every application, we obtain good resultsin our rainfall example by using a SMC in the pixel space and aEuclidean distance in the attribute space.

Non-metric multi-dimensional scaling methods (see e.g.[37,43]) can be used to derive approximate distance preservingtransformations between pairs of points ðs; tÞ in high and low-dimensional spaces. These methods are usually designed to findthe complete set of image attribute vectors y that minimize a‘‘stress function’’ of the following general form:

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

Attribute 2

Archived ground truthArchived operationalCurrent operationalProposal sample

Attribute 2

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5proposal pdf proposal sample

Attribute 1

Attribute 2

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5likelihood pdf [−]archived operational − archived ground truthcurrent opertional − proposal sample

Fig. 5. (a) 2-D attribute space obtained by DD-HDS mapping of 256-D rain support data set. (b) Estimate of proposal pdf (black contour lines); red dots represent proposalsample mapped into the attribute space; prior is assumed to be uniform. (c) Estimate of the likelihood (black contour lines), archived operational measurement errors (bluedots) and current operational measurement errors (black dots). (For interpretation of the references to color in this figure legend, the reader is referred to the web version ofthis article.)


f ¼Xs<t

F jdðys; ytÞ � dðys; ytÞjh i

s; t ¼ 1; . . . NI ð15Þ

where, for our application, dðys; ytÞ is the SMC distance betweenpoints ys and yt in the pixel space, dðys; ytÞ is the Euclidean distancebetween the corresponding points ys and yt in the attribute space.The function Fð�Þ is a specified stress measure. Some classical exam-ples are Sammon mapping [41,42,54] and recent variants[4,5,13,52]. Here we use a stress function proposed in [43] that isdesigned to work well when the high dimension space is very large.

This function, referred to as data-driven high-dimensional scaling(DD-HDS), is:

fDD-HDS¼Xs<t

jdðys;ytÞ� dðys; yt j 1�Z minðdðys ;yt Þ;dðys ;yt ÞÞ

�1f ðu;l;rÞdu

!" #ð16Þ

where f ðu;l;rÞ is the probability density function of a Gaussianrandom variable u with mean l and standard deviation r. It is sug-gested in [43] to choose these parameters in a way that they adaptto the effective distribution of distances dðys; ytÞ8s; t in the original,high-dimensional space. One option is to use:


l ¼ mean16s<t6N

ðdðys; ytÞÞ � 2ð1� kÞ std16s<t6N

ðdðys; ytÞÞ ð17Þ

r ¼ 2k std16s<t6N

ðdðys; ytÞÞ ð18Þ

where the mean and standard deviation (std) are taken over the dis-tribution of distances between all pairs of data in the original spaceand k is a positive user-defined parameter (usually to be takenbetween 0.1 and 0.9) that controls the relative emphasis given tolarge vs. small distances in high-dimensional space.

In general, the stress must be minimized with a numerical opti-mization algorithm that gives the optimum set of image points y inthe attribute space. Here we use the original implementation ofDD-HDS given in [43], which is based on an algorithm describedin [24], where it is called the ‘‘Force Directed Placement’’ paradigm[48].

0 50 100 150 200 2500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Sorted replicate no [−]

Pos

teri

or w

eigh

t [−]

Fig. 6. Posterior weights sorted in descending order.

4. Rainfall example

4.1. Experimental design

We illustrate the key aspects of our feature-based importancesampling procedure with an example that uses noisy satellite mea-surements to characterize rainfall patterns. For this example weconsider binary images that have only two intensity levels (rainor no-rain), primarily to focus on spatial intermittency. Irregularintermittent features are commonly encountered in environmentalapplications but are difficult to describe with traditional continu-ous random field models. When intensities are categorized into afinite number of levels (e.g. two) all of the probabilities used inthe importance sampling procedure are discrete and there is afinite (but very large) number of possible true images. This simpli-fies both the computation and visualization of results. However, itshould be noted that the methods described in the preceding sec-tions are sufficiently general to deal with continuously variableintensities or with intermittent features that have variable intensi-ties within disconnected areas. We intend to examine these moregeneral cases in the future.

The hydrologic and meteorological communities have devotedconsiderable attention to the characterization and quantitativeforecasting of uncertain rainfall events. Recent advances in remotesensing of rainfall have provided an opportunity to merge datafrom multiple sensors with different error properties. Examplesof multi-sensor rainfall characterization algorithms include[9,19,23,29,39,60,63]. Algorithms that have been run operationallyto produce global rainfall estimates include [30,31,36,38,40,64].Most of these efforts provide deterministic products and do notattempt to quantify estimation errors or to rank different possibleoutcomes. Exploratory studies that explicitly account for uncer-tainty in satellite-based rainfall estimates include [2,10,11,27].These studies typically rely on parametric measurement errormodels that may not adequately describe the complex errors thatcan affect the geometry as well as the intensity of noisy spatialmeasurements. The approach taken in our example and summa-rized in the preceding sections provides probabilistic productsbased on non-parametric measurement error models.

It is possible to check the performance of a feature characteriza-tion procedure either by processing actual measurements or byconducting a virtual experiment where a known ‘‘true feature’’ isused to derive synthetic measurements. In the latter case posteriorrealizations can be compared directly to the known truth. Whenevaluating new algorithms it is often advantageous to first conducta controlled virtual experiment to test the concept and then toassess practical feasibility with real measurements. In this paperwe describe the virtual component of a longer-term research pro-gram concerned with rainfall characterization.

In order to make the virtual experiment realistic we derive truefeatures, proposals, and synthetic measurements from images inthe NEXt Generation RADdar (NEXRAD-IV) dataset. NEXRAD mea-surements, which are available for most of the continental US, arecollected by the National Weather Service ground-based WSR-88Dradar network [25] and processed by the National Centers for Envi-ronmental Prediction (NCEP). NEXRAD rainfall intensity is derivedfrom radar backscatter with the procedure described by [8]. TheNEXRAD data used here are arranged on a latitude-longitude gridof 0.25� by 0.25� pixels covering 31.625� to 47.625�N and108.625� to 80.625�W at 12 h intervals over the period Jul. 2003through Dec. 2010. The set of archived ground truth images (pixelvectors v1;v2; . . . ;vNu ) consists of 2773 images from different 16by 16 pixel (4� by 4�) sections of the large NEXRAD grid. We desig-nate one of the ground truth images to be the true image (pixelvector x). This true image is set aside and no longer included inthe archived ground truth set.

Although NEXRAD is generally believed to provide high qualitymeasurements of rainfall it is not available globally. Global rainfallintensities must be inferred from low orbit satellite sensors such asAMSU [62] and SSMI [20] which typically detect rainfall throughobservations of microwave brightness or cloud top temperature.Low orbit microwave observations are generally less accurate thanNEXRAD and are available only at intermittent times. The syntheticoperational measurements used in our virtual experiment areintended to imitate the general characteristics of these noisyobservations.

The current operational measurement for each of the Nz sensorsis a single 16 by 16 pixel random SNESIM realization with the com-mon true image used as a training image. The realization’s devia-tion from the true image represents measurement error. Weconsider two options with either Nz ¼ 1 current measurementfrom Sensor 1 or Nz ¼ 2 current measurements from Sensors 1and 2. The Sensor 1 measurement is constrained by a randomlyselected set of 25% of the training image pixel intensities whilethe somewhat more accurate Sensor 2 measurement is constrainedby 20% of the training image pixel values.

We use the techniques of Section 3.2 to generate Nx ¼ 240pixel-based proposals composed of three sets of 80 proposals con-strained by, respectively, 1%, 5%, and 25% of the image pixels (seeFig. 3). When Nz ¼ 1 all 240 proposals are constrained by valuesfrom the current Sensor 1 measurement. When Nz ¼ 2 the first120 proposals are constrained by the Sensor 1 measurement andthe remaining 120 proposals are constrained by the Sensor 2measurement.

In our example we obtain a set of Nu ¼ 300 paired archivedoperational and ground truth synthetic measurements using a pro-cedure similar to the one used to generate the current operationalmeasurement. Each archived operational measurement is aSNESIM realization that uses a particular ground truth image as a


training image. The realization is constrained by intensity values at25% or 20% of the training image pixels for measurements fromSensors 1 and 2, respectively. When we use only Sensor 1 all ofthe archived operational measurements are assumed to be fromthis sensor. When we use both Sensors 1 and 2 half of the archivedmeasurements are from each sensor.

The archived operational measurement and the ground truthmeasurement used as its training image form a distinct pair. Thedifferences between the mapped measurement attribute values

Fig. 7. (a) True image. (b) Current operational measurement. (c) 20 proposal realizationposterior weights.

define an additive attribute error for the pair, as described in Sec-tion 3.3 and shown in Fig. 4. The probability pE ð�Þ is estimated byfitting a Gaussian mixture to errors derived from archival opera-tional measurements that are similar to the current operationalmeasurement, as measured by the Jaccard distance in the pixelspace. When mapping image pixel values to attribute values wespecify only Na ¼ 2 attributes so we can plot non-parametric den-sity fits to the proposal density and likelihood functions as con-tours in two dimensions. These design choices yield a set of 841

s with the highest posterior weights. (d) 20 proposal realizations with the smallest

Distance in 256−D pixel space [−]

Dis

tan

ce in

2−D

att

rib

ute

sp

ace

[−]

0 0.09 0.18 0.28 0.37 0.46 0.55 0.65

1.95

1.7

1.45

1.2

0.96

0.71

0.46

0.21

0

500

1000

1500

2000

2500

Fig. 8. dy–dx diagram for the DD-HDS mapping in Fig. 5(a). Heat color scalerepresents the number of images pairs corresponding to SMC and Euclideandistances indicated on the x and y axes, respectively. Images that are close in thepixel space generally map to points that are close in the attribute space. (Forinterpretation of the references to color in this figure legend, the reader is referredto the web version of this article.)


images to be mapped from the 256-dimensional pixel space to the2-dimensional attribute space. The mapping is performed with theDD-HDS technique described in Section 3.4.

Attribute 2

−1.5 −1 −0.5

−1.5

−1

−0.5

0

0.5

1

1.5likacu

Attri

Attribute 2

−1.5 −1 −0.5

−1.5

−1

−0.5

0

0.5

1

1.5likacu

Fig. 9. (a) Estimate of the likelihood for measurement 1 in Fig. 10(b) (black contour lineArchived operational measurement errors are represented by blue dots and current operreferences to color in this figure legend, the reader is referred to the web version of thi

4.2. Results

The results of the image mapping for the single current mea-surement case are shown in Fig. 5(a), which distinguishes archivedground truth measurements (blue), archived operational measure-ments (green), a single current operational measurement (black),and the proposals (red). The attributes that form the axes for thisplot are data-dependent and do not have any particular geometricinterpretation. Their primary role is to provide a concise imagedescription that keeps images that are close (far apart) in the pixelspace close (far apart) in the attribute space. It is apparent that theproposals tend to lie in the upper half of the diagram, with a subsetrelatively close to the current operational measurement (these arethe proposals that are most constrained by the current measure-ment). The archived ground truth and operational measurementstend to be centered around the middle of the diagram, suggestingthat they correspond to a diverse set of historical true images.

Fig. 5(b) shows contours of a Gaussian mixture approximationfor the proposal probability density qðxÞ, fit to the red proposalsamples. The highest probability region is the dense cluster nearthe current measurement. Fig. 5(c) shows contours of a Gaussianmixture approximation for the measurement error probabilitydensity pE ð�Þ fit to the blue differences uj � v j between archivedoperational and ground truth measurements. The Gaussian mix-ture estimation algorithm is described in [22]. The likelihood valuefor a particular proposal xi is obtained by evaluating (7) at the blackdots, which represent the differences z� xi between the single cur-rent operational measurement and the proposal. The posteriorweight wi for each proposal is obtained from (8), with the relatively

0 0.5 1 1.5

elihoodrchived operational − archived ground truthrrent operational − proposal sample

bute 1

0 0.5 1 1.5

elihoodrchived operational − archived ground truthrrent operational − proposal sample

s). Estimate of the likelihood for measurement 2 in Fig. 10(c) (black contour lines).ational measurement errors are represented as black dots. (For interpretation of thes article.)

Fig. 10. (a) True image. (b) Current operational measurement 1. (c) Current operational measurement 2. (d) 20 Proposal realizations with the highest posterior weights. (e) 20Proposal realizations with the smallest posterior weights.


uninformative prior density pXðxiÞ assumed to be uniform through-out the attribute plane, covering the range �1.5 to 1.5 for eachattribute.

Fig. 6 shows the spectrum of proposal weights, ordered fromlargest to smallest, for this example. Note that a relatively smallnumber of samples have higher weights but no single sample dom-inates. Fig. 7(c) and (d) shows, respectively, the 15 most probableand 15 least probable proposals in the complete set of 240. Thetrue image and current measurement image are shown for refer-ence in Fig. 7(a) and (b), respectively. It is apparent that the higher

weighted proposals in Fig. 7(c) look more like the true image. Thevariation among these proposals give a visual indication ofthe uncertainty in the true image. This uncertainty reflects boththe variability built into the proposal population and the measure-ment error assessment inferred from archived measurements.

The Demartine’s diagram [43,15] shown in Fig. 8 provides away to examine the quality of the multi-dimensional scalingtransformation from pixel to attribute spaces. This diagram plotsthe attribute space distance vs the pixel space distance for allpossible pairs of mapped images, indexed by ðs; tÞ. If the


DD-HDS transformation preserved distances between pairsperfectly, the points in Fig. 8 would fall on a narrow curve, withlarger pixel space distances always giving larger attribute spacedistances. In fact, the points in the figure form a broad cloud thattrends generally from the lower left (where both distances aresmall) to the upper right (where both distances are large). Sincethere are many pairs the cloud is shaded to indicate the densityof points in a given area (darker colored areas have more points).The plot indicates that there are pairs where the distances in thetwo spaces are not compatible, although most pairs fall along thecenter of the cloud. A better transformation would be character-ized by a narrower cloud with a dark color. It is possible to obtainbetter results by using fewer images or more attributes since itbecomes more difficult to preserve distances when there aremore pairs and the attribute space is small.

In the context of our application, [33,46,47] show that when NI

images in a given pixel space with any specified metric are mappedinto a smaller Na-dimensional Euclidean space the multiplicative

distortion in the set of mapped points is Oðmin½N2

NaI log

32NI;NI�Þ

where distortion is measured in terms of a stress functions or per-formance criterion such as (16). So the mapping error increases asNI increases for fixed Na and as Na decreases for fixed NI . In addi-tion, the optimization algorithms used to minimize the multi-dimensional scaling stress function such as (16) become impracti-cal, or at least very expensive, for data sets containing more than afew thousand points since the stress computation effort increasesquadratically with NI .

These factors encourage the use of small proposal ensembles andarchived measurement data sets and large numbers of attributes.But the importance sampling process becomes less able to properlyassign posterior weights when the proposal size decreases and thenon-parametric likelihood derivation become less reliable whenthe number of archived data points decreases. Also, both importancesampling and likelihood derivation need more images to maintaingood performance when the number of attributes is increased. Sothere is a clear tension between the need to obtain a more accurateand efficient multi-dimensional scaling transformation (fewerimages, more attributes) and a more accurate importance samplingprocedure (more images, fewer attributes).

Fig. 9 shows the likelihoods obtained for two different currentmeasurements, referred to as the measurement 1 and measurement2, generated with different constraint procedures from the sametrue image (25% constraining pixels for measurement 1 and 20% con-straining pixels for measurement 2). Note the distinctly differentshapes of the individual likelihoods, reflecting the different waysthe image attributes are affected by errors in the two measurements.When the product of these likelihoods is inserted into (9) we obtainthe results shown in Fig. 10. Fig. 10(a) shows the single true image(the same as in Fig. 7(a)) while Fig. 10(b) and (c) shows the two cur-rent measurements used to characterize this image. The most prob-able and least probable proposal sets look broadly similar to thoseobtained in Fig. 7 with one measurement but they are not identical.Note that some of the proposals from Fig. 7 have reappeared in themost probable set, with different ranks. The two-measurementresult presented here is an example of image fusion, the combina-tion of different measurements and archival information to obtainan improved characterization of hidden features.

5. Conclusions and future research directions

Importance sampling provides a useful framework for interpret-ing and merging observations of complex geometrical features. Itaccommodates general models of image structure and measure-ment error and is relatively easy to implement. However, impor-tance sampling needs adequate ensemble sizes, accurate

measurement error models, and realistic feature proposals to beuseful in practical applications. When images are described withhigh-dimensional vectors of pixel values a very large ensemble ofproposed images is needed to provide a representative sample ofall possible combinations. In such cases importance sampling isnot practical. The image characterization approach proposed heredescribes images with low-dimensional image attribute vectors thatare derived by applying a multi-dimensional scaling transformationto pixel-based images. The transformation is not specified inadvance but inferred from a large set of proposals and archived mea-surements. Importance sampling in a properly constructed attributespace is feasible with a moderate number of proposal samples.

In order to properly represent measurement errors ourapproach relies on non-parametric probability densities estimatedfrom an image archive rather than hypothesized parametric densi-ties. A non-parametric approach is better able to deal with uncon-ventional measurement errors such as misregistered images orartifacts near feature boundaries. This increases the applicabilityof the method and insures that the posterior probabilities properlyreflect the effects of measurement uncertainty. Moreover, itenables the image characterization procedure to make best useof information from different sensors with different strengthsand weaknesses.

We also use a non-parametric approach to generate proposals.These proposals are constructed with a multipoint statistical algo-rithm that relies on training images to convey information aboutdistinctive feature properties. The proposals are constrained, tovarious degrees, by current measurements to insure that they areable to capture non-stationarities. Since proposals generated inthis way are realistic they yield better results than proposalsdrawn from hypothesized parametric densities.

The primary limitation of an attribute-based approach toimportance sampling is the inevitable loss of information thatoccurs when high-dimensional images are described with muchlower-dimensional attribute vectors. In particular, as the numberof proposal and archive samples increases it becomes harder topreserve in the attribute space distances between the mappedimages. This reduces the performance benefits that would other-wise be achieved by increasing the sample size. An importancesampling procedure that relies on multi-dimensional scaling mustcompromise between the need to keep the sample size small foraccurate mapping of images and the need to keep the sample sizelarge for accurate representation of attribute probabilities.

Future research in this area should build on the inherent advan-tages of the importance sampling approach, especially flexibilityand generality, while addressing its limitations. It would be usefulto examine alternative dimensionality reduction options that per-form better than multi-dimensional scaling as sample sizes areincreased. These alternatives will likely need to go beyond distancepreservation when selecting image attributes, while remainingdata-driven. Other topics worth further effort include an assess-ment of alternative similarity measures in the pixel and attributespaces and an investigation of the benefits to be obtained byincreasing the number of attributes.

The results provided in our rainfall example indicate that attri-bute-based importance sampling is a promising option for charac-terizing complex categorical images. Ultimately, one of the mostsignificant aspects of the importance sampling approach is the per-spective it provides, by presenting a range of possible image alter-natives that gives an accessible and informative picture ofuncertainty.

Acknowledgment

Funding for this research was provided in part by the ENIMulti-scale Reservoir Science Project within the ENI-MIT Energy


Initiative Founding Member Program. Additional support was pro-vided by the National Science Foundation Grant DMS-0530851.

References

[1] Ades M, van Leeuwen PJ. An exploration of the equivalent-weights particlefilter. Q J R Meteorol Soc 2012. http://dx.doi.org/10.1002/qj.1995.

[2] AghaKouchak A, Bardossy A, Habib E. Conditional simulation of remotelysensed rainfall data using a non-gaussian v-transformed copula. Adv WaterResour 2010;33(6):624–34. http://dx.doi.org/10.1016/j.advwatres.2010.02.010.

[3] Aghasi A, Mendoza-Sanchez I, Miller EL, Ramsburg CA, Abriola LM. A geometricapproach to joint inversion with applications to contaminant source zonecharacterization. Inverse Prob 2013;29(11):115014. http://dx.doi.org/10.1088/0266-5611/29/12/129901.

[4] Agrafiotis DK, Xu H, Zhu F, Bandyopadhyay D, Liu P. Stochastic proximityembedding: methods and applications. Mol Inf 2010;29(11):758–70. http://dx.doi.org/10.1002/minf.201000134.

[5] Agrafiotis DK. Stochastic proximity embedding. J Comput Chem 2003;24(10):1215–21. http://dx.doi.org/10.1002/jcc.10234.

[6] Arts R, Eiken O, Chadwick A, Zweigel P, van der Meer L, Zinszner B. Monitoringof CO2 injected at Sleipner using time-lapse seismic data. Energy 2004;29(9–10):1383–92. http://dx.doi.org/10.1016/j.energy.2004.03.072.

[7] Arulampalam MS, Maskell S, Gordon N, Clap T. A tutorial on particle filters foronline nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process2002;50:174–88. http://dx.doi.org/10.1109/78.978374.

[8] Baldwin ME, Mitchell KE. Progress on the NCEP hourly multi-sensor USprecipitation analysis for operations and GCIP research. In: Proceedings of thesecond symposium on integrated observing systems. Phoenix, AZ: AmericanMeteorological Society; 1998.

[9] Bellerby T, Hsu K, Sorooshian S. LMODEL: a satellite precipitation methodologyusing cloud development modeling. Part I: Algorithm construction andcalibration. J Hydrometeorol 2009;10(5):1081–95. http://dx.doi.org/10.1175/2009JHM1091.1.

[10] Bellerby TJ. Ensemble representation of uncertainty in Lagrangian satelliterainfall estimates. J Hydrometeorol 2013;14(5):1483–99. http://dx.doi.org/10.1175/JHM-D-12-0121.1.

[11] Bellerby TJ, Jizhong S. Probabilistic and ensemble representations of theuncertainty in an IR/microwave satellite precipitation product. J.Hydrometeorol 2005;6(6):1032–44. http://dx.doi.org/10.1175/JHM454.1.

[12] Bellman RE. Adaptive control processes: a guided tour. Princeton UniversityPress; 1961.

[13] Cai H, Mikolajczyk K, Matas J. Learning linear discriminant projections fordimensionality reduction of image descriptors. IEEE Trans Pattern Anal MachIntell 2011;33(2):338–52. http://dx.doi.org/10.1109/TPAMI.2010.89.

[14] Dehghannejad M, Juhlin Ch, Malehmir A, Skyttä P, Weihed P. Reflectionseismic imaging of the upper crust in the Kristineberg mining area, northernSweden. J Appl Geophys 2010;71(4):125–36. http://dx.doi.org/10.1016/j.jappgeo.2010.06.002.

[15] Demartines P, Hérault J. Curvilinear component analysis: a self-organizingneural network for nonlinear mapping of data sets. IEEE Trans NeuralNetworks 1997;8:148–54. http://dx.doi.org/10.1109/72.554199.

[16] Deza E, Deza MM. Encyclopedia of distances. Springer; 2009.[17] Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo methods in

practice. Springer; 2001.[18] Dunn G, Everitt BS. An introduction to mathematical taxonomy. Courier Dover

Publications; 2004.[19] Ebtehaj AM, Foufoula-Georgiou E. Adaptive fusion of multisensor precipitation

using Gaussian-scale mixtures in the wavelet domain. J Geophys Res Atmos,2156-2202 2011;116(D22). http://dx.doi.org/10.1029/2011JD016219.

[20] Ferraro RR, Marks GF. The development of SSM/I rain-rate retrieval algorithmsusing ground-based radar measurements. J Atmos Oceanic Technol 1995;12:755–70. http://dx.doi.org/10.1175/1520-0426(1995)012<0755:TDOSRR>2.0.CO;2.

[21] Fewster RM, Buckland ST. Similarity indices for spatia I ecological data.Biometrics 2001;57(2):495–501. http://dx.doi.org/10.1111/j.0006-341x.2001.00495.x.

[22] Figueiredo M, Jain AK. Unsupervised learning of finite mixture models. IEEETrans Pattern Anal Mach Intell 2002;24(3):381–96. http://dx.doi.org/10.1109/34.990138.

[23] Turk FJ, Rohaly GD, Hawkins J, Smith EA, Marzano FS, Mugnai A, Levizzani V.Meteorological applications of precipitation estimation from combined SSM/I,TRMM and infrared geostationary satellite data. In: Pampaloni PP, Paloscia S,editors. Microwave radiometry and remote sensing of the Earthís surface andatmosphere. VSP Int. Sci. Publ.; 2000. p. 353–63.

[24] Fruchterman T, Reingold E. Graph drawing by force-directed placement.Software-Pract Exp 1991;21:1129–64. http://dx.doi.org/10.1002/spe.4380211102.

[25] Fulton RA, Breidenbach JP, Seo DJ, Miller DA. The WSR-88D rainfall algorithm.Weather Forecasting 1998;13:377–95. http://dx.doi.org/10.1175/1520-0434(1998)013<0377:TWRA>2.0.CO;2.

[26] Gluyas J, Swarbrick R. Petroleum geoscience. Blackwell Publishing; 2004.[27] Grimes DF. An ensemble approach to uncertainty estimation for satellite-

based rainfall estimates. In: Sorooshian S, Hsu K, Coppola E, Tomassetti B,Verdecchia M, Visconti G, editors. Hydrological modelling and the water cycle.

Water science and technology library, vol. 63. Berlin, Heidelberg: Springer;2008. p. 145–62.

[28] Guardiano F, Srivastava RM. Multivariate geostatistics: beyond bivariatemoments. In: Soares A, editor. Geostatistics troia, vol. 1. Kluver AcademicPublications; 1993. p. 133–44.

[29] Gupta R, Venugopal V, Foufoula-Georgiou E. A methodology for mergingmultisensor precipitation estimates based on expectation-maximization andscale-recursive estimation. J Geophys Res Atmos 2006;111(D2). http://dx.doi.org/10.1029/2004jd005568.

[30] Hsu KL, Sorooshian S. Satellite-based precipitation measurement usingPERSIANN system. In: Sorooshian S, Hsu KL, Coppola E, Tomassetti B,Verdecchia M, Visconti G, editors. Hydrological modeling and the watercycle. Springer-Verlag; 2008. p. 27–48.

[31] Huffman JG, Bolvin DT, Nelkin EJ, Wolff DB, Adler RF, Gu GH, Hong Y, BowmanKP, Stocker EF. The TRMM multisatellite precipitation analysis (TMPA): quasi-global, multiyear, combined-sensor precipitation estimates at fine scales. JHydrometeorol 2007;8(1):38–55. http://dx.doi.org/10.1175/JHM560.1.

[32] Iglesias MA, McLaughlin D. Level-set techniques for facies identification inreservoir modeling. Inverse Prob 2011;27:035008. http://dx.doi.org/10.1088/0266-5611/27/3/035008.

[33] Indyk P, Matousek J. Low-distortion embeddings of finite metric spaces. In:Handbook of discrete and computational geometry. CRC Press; 2004. p.177–96.

[34] Jaccard P. Étude comparative de la distribution florale dans une portion desAlpes et des Jura. Bull Soc Vaud Sci Nat 1901;37:547–79. http://dx.doi.org/10.5169/seals-266450.

[35] Jian B, Vemuri BC. Robust point set registration using gaussian mixturemodels. IEEE Trans Pattern Anal Mach Intell 2011;33(8). http://dx.doi.org/10.1109/TPAMI.2010.223.

[36] Joyce RJ, Xie P, Yarosh Y, Janowiak JE, Arkin PA. CMORPH: a morphingapproach for high resolution precipitation product generation. In:Gebremichael M, Hossain F, editors. Satellite rainfall applications for surfacehydrology. Springer; 2010. p. 23–37.

[37] Kruskal JB. Multidimensional scaling by optimizing goodness of fit to anonmetric hypothesis. Psychometrika 1964;29(1):1–27. http://dx.doi.org/10.1109/TGRS.2007.895337.

[38] Kubota T, Shige S, Hashizume H, Aonashi K, Takahashi N, Seto S, et al. Globalprecipitation map using satellite-borne microwave radiometers by the GSMaPproject: production and validation. IEEE Trans Geosci Remote Sens2007;45(7):2259–75. http://dx.doi.org/10.1109/TGRS.2007.895337.

[39] Kuligowski RJ, Barros AP. Blending multiresolution satellite data withapplication to the initialization of an orographic precipitation model. J ApplMeteorol 2001;40(9):1592–606. http://dx.doi.org/10.1175/1520-0450(2001)040<1592:BMSDWA>2.0.CO;2.

[40] Lensky IM, Levizzani V. Estimation of precipitation from space-basedplatforms. In: Michaelides S, editor. Precipitation: advances in measurement,estimation and prediction. Springer-Verlag; 2008. p. 195–217.

[41] Lerner B, Guterman H, Aladjem M, Dinstein I. On the initialisation of Sammon’snonlinear mapping. Pattern Anal Appl 2000;3(1):61–8. http://dx.doi.org/10.1007/s100440050006.

[42] Lerner B, Guterman H, Aladjem M, Dinsteint I, Romem Y. On patternclassification with Sammon’s nonlinear mapping an experimental study.Pattern Recognit 1998;31(4). http://dx.doi.org/10.1016/S0031-3203(97)00064-2.

[43] Lespinats S, Verleysen M, Giron A, Fertil B. DD-HDS: a tool for visualization andexploration of high dimensional data. IEEE Trans Neural Networks2007;18(5):1265–79. http://dx.doi.org/10.1109/TNN.2007.891682.

[44] Lingala N, Namachchivaya NS, Perkowski N, Yeong HC. Particle filtering inhigh-dimensional chaotic systems. Chaos 2012;22(4). http://dx.doi.org/10.1063/1.4754875.

[45] Lipkus AH. A proof of the triangle inequality for the Tanimoto distance. J MathChem 1999;26(1–3):263–5. http://dx.doi.org/10.1023/A:1019154432472.

[46] Matousek J. On the distortion required for embedding finite metric spaces intonormed spaces. Isr J Math 1996;93(1):333–44. http://dx.doi.org/10.1007/BF02761110.

[47] Matousek J. On embedding trees into uniformly convex Banach spaces. Isr JMath 1999;114(1):221–37. http://dx.doi.org/10.1007/BF02785579.

[48] Di Battista GD, Eades P, Tamassia R, Tollis IG. Graph drawing: algorithms forthe visualization of graphs graph drawing. Englewood Cliffs: Prentice Hall;1999.

[49] McLachlan G, Peel D. Finite mixture models. Wiley series in probability andstatistics. John Wiley and Sons; 2000.

[50] Parzen E. On estimation of a probability density function and mode. Ann MathStat 1962;33:1065–76. http://dx.doi.org/10.1214/aoms/1177704472.

[51] Qian H, Wu X, Xu Y. Intelligent surveillance systems. Springer; 2011.[52] Rassokhi DN, Agrafiotis DK. A modified update rule for stochastic proximity

embedding. J Mol Graphics Model 2003;22(2):133–40. http://dx.doi.org/10.1016/S1093-3263(03)00155-4.

[53] Remy Nicolas, Boucher A, Wu Jianbing. Applied geostatistics withSGeMS. Cambridge: Cambridge University Press; 2009.

[54] Sammon JW. A nonlinear mapping for data structure analysis. IEEE TransComput 1969;18:401–9. http://dx.doi.org/10.1109/T-C.1969.222678.

[55] Scott DW, Szewczyk WF. From kernels to mixtures. Technometrics2001;43:323–35. http://dx.doi.org/10.1198/004017001316975916.

[56] Silverman BW. Density estimation for statistics and data analysis. NewYork: Chapman and Hall; 1986.

http://dx.doi.org/10.1002/qj.1995



http://dx.doi.org/10.1088/0266-5611/29/12/129901

http://dx.doi.org/10.1088/0266-5611/29/12/129901

http://dx.doi.org/10.1002/minf.201000134

http://dx.doi.org/10.1002/minf.201000134

http://dx.doi.org/10.1002/jcc.10234

http://dx.doi.org/10.1016/j.energy.2004.03.072

http://dx.doi.org/10.1109/78.978374

http://refhub.elsevier.com/S0309-1708(14)00066-9/h0045




http://dx.doi.org/10.1175/2009JHM1091.1

http://dx.doi.org/10.1175/2009JHM1091.1

http://dx.doi.org/10.1175/JHM-D-12-0121.1

http://dx.doi.org/10.1175/JHM-D-12-0121.1

http://dx.doi.org/10.1175/JHM454.1



http://dx.doi.org/10.1109/TPAMI.2010.89

http://dx.doi.org/10.1016/j.jappgeo.2010.06.002

http://dx.doi.org/10.1016/j.jappgeo.2010.06.002

http://dx.doi.org/10.1109/72.554199






http://dx.doi.org/10.1029/2011JD016219

http://dx.doi.org/10.1175/1520-0426(1995)012<0755:TDOSRR>2.0.CO;2

http://dx.doi.org/10.1175/1520-0426(1995)012<0755:TDOSRR>2.0.CO;2

http://dx.doi.org/10.1111/j.0006-341x.2001.00495.x

http://dx.doi.org/10.1111/j.0006-341x.2001.00495.x

http://dx.doi.org/10.1109/34.990138

http://dx.doi.org/10.1109/34.990138






http://dx.doi.org/10.1002/spe.4380211102

http://dx.doi.org/10.1002/spe.4380211102

http://dx.doi.org/10.1175/1520-0434(1998)013<0377:TWRA>2.0.CO;2

http://dx.doi.org/10.1175/1520-0434(1998)013<0377:TWRA>2.0.CO;2










http://dx.doi.org/10.1029/2004jd005568

http://dx.doi.org/10.1029/2004jd005568





http://dx.doi.org/10.1175/JHM560.1

http://dx.doi.org/10.1088/0266-5611/27/3/035008

http://dx.doi.org/10.1088/0266-5611/27/3/035008




http://dx.doi.org/10.5169/seals-266450

http://dx.doi.org/10.5169/seals-266450







http://dx.doi.org/10.1109/TGRS.2007.895337



http://dx.doi.org/10.1175/1520-0450(2001)040<1592:BMSDWA>2.0.CO;2

http://dx.doi.org/10.1175/1520-0450(2001)040<1592:BMSDWA>2.0.CO;2




http://dx.doi.org/10.1007/s100440050006

http://dx.doi.org/10.1007/s100440050006

http://dx.doi.org/10.1016/S0031-3203(97)00064-2

http://dx.doi.org/10.1016/S0031-3203(97)00064-2

http://dx.doi.org/10.1109/TNN.2007.891682

http://dx.doi.org/10.1063/1.4754875

http://dx.doi.org/10.1063/1.4754875

http://dx.doi.org/10.1023/A:1019154432472

http://dx.doi.org/10.1007/BF02761110

http://dx.doi.org/10.1007/BF02761110

http://dx.doi.org/10.1007/BF02785579






http://dx.doi.org/10.1214/aoms/1177704472


http://dx.doi.org/10.1016/S1093-3263(03)00155-4

http://dx.doi.org/10.1016/S1093-3263(03)00155-4



http://dx.doi.org/10.1109/T-C.1969.222678

http://dx.doi.org/10.1198/004017001316975916




[57] Strabelle S. Conditional simulation of complex geological structures usingmultiple point statistics. Math Geol 2002;34:1–21. http://dx.doi.org/10.1023/A:1014009426274.

[58] Strabelle S, Remy N. Post-processing of multiple-point geostatistical models.In: Leuangthong, Deutsch CV, editors. Geostatistics banff 2004, vol.2. Doordrecht, The Netherlands: Springer; 2004. p. 979–88.

[59] Surussavadee C, Staelin DH, Chadarong V, McLaughlin D, Entekhabi D.Comparison of NOWRAD, AMSU, AMSR-E, TMI, and SSM/I surfaceprecipitation rate retrievals over the United States Great Plains. In: IEEEinternational geoscience and remote sensing symposium, 2007, IGARSS 2007;2007. p. 3923–6.

[60] Todd CM, Martin C, Kidd C, Kniveton D, Bellerby T, Tim J. A combined satelliteinfrared and passive microwave technique for estimation of small-scalerainfall. J Atmos Oceanic Technol 2001;18(5):742–55. http://dx.doi.org/10.1175/1520-0469(2001)058<0742:ACSIAP>2.0.CO;2.

[61] Wand MP, Jones MC. Kernel smoothing. London: Chapman & Hall; 1995.http://dx.doi.org/10.1029/2002RS002679.

[62] Weng Fuzhong, Zhao Limin, Ferraro Ralph R, Poe Gene, Li Xiaofan, GrodyNorman C. Advanced microwave sounding unit cloud and precipitationalgorithms. Radio Sci 2003;38(4):8068. http://dx.doi.org/10.1029/2002RS002679.

[63] Wójcik R, McLaughlin D, Konings A, Entekhabi D. Conditioning stochasticrainfall replicates on remote sensing data. IEEE Trans Geosci Remote Sens2009;47:2436–49. http://dx.doi.org/10.1109/TGRS.2009.2016413.

[64] Xie P, Arkin PA, Janowiak JE. CMAP: the CPC merged analysis of precipitation.In: Levizzani V, Bauer P, Turk FJ, editors. Measuring precipitation fromspace. Springer; 2007. p. 319–28.

[65] Zimek A, Schubert E, Kriegel HP. A survey on unsupervised outlier detection inhigh-dimensional numerical data. Stat Anal Data Min 2012;5(5):363–87.http://dx.doi.org/10.1002/sam.11161.

http://dx.doi.org/10.1023/A:1014009426274

http://dx.doi.org/10.1023/A:1014009426274




http://dx.doi.org/10.1175/1520-0469(2001)058<0742:ACSIAP>2.0.CO;2

http://dx.doi.org/10.1175/1520-0469(2001)058<0742:ACSIAP>2.0.CO;2

http://dx.doi.org/10.1029/2002RS002679

http://dx.doi.org/10.1029/2002RS002679

http://dx.doi.org/10.1029/2002RS002679





http://dx.doi.org/10.1002/sam.11161

Documents

Advances in Water Resources - MITweb.mit.edu/~hamed_al/www/Alemohammad_Ensemble...R. Wójcik et al./Advances in Water Resources 70 (2014) 36–50 37 prior density can be speciﬁed