Hierarchical Refinement of Latin Hypercube Samplestechniques,” a stratiﬁed sampling strategy called Latin Hypercube Sampling (LHS). LHS was ﬁrst suggested by W. J. Conover, whose

Computer-Aided Civil and Infrastructure Engineering 30 (2015) 394–411

Hierarchical Refinement of Latin Hypercube Samples

Miroslav Vorechovsky*

Faculty of Civil Engineering, Institute of Structural Mechanics, Brno University of Technology, Veverı 95, 602 00, Brno,Czech Republic

Abstract: In this article, a novel method for the exten-sion of sample size in Latin Hypercube Sampling (LHS)is suggested. The method can be applied when an initialLH design is employed for the analysis of functions gof a random vector. The article explains how the statisti-cal, sensitivity and reliability analyses of g can be dividedinto a hierarchical sequence of simulations with subsetsof samples of a random vector in such a way that (i) thefavorable properties of LHS are retained (the low num-ber of simulations needed for statistically significant esti-mations of statistical parameters of function g with lowestimation variability); (ii) the simulation process can behalted, for example, when the estimations reach a cer-tain prescribed statistical significance. An important as-pect of the method is that it efficiently simulates subsets ofsamples of random vectors while focusing on their corre-lation structure or any other objective function such assome measure of dependence, spatial distribution unifor-mity, discrepancy, etc. This is achieved by employing arobust algorithm based on combinatorial optimization ofthe mutual ordering of samples. The method is primarilyintended to serve as a tool for computationally intensiveevaluations of g where there is a need for pilot numericalstudies, preliminary and subsequently refined estimationsof statistical parameters, optimization of the progres-sive learning of neural networks, or during experimentaldesign.

1 INTRODUCTION

Statistical sampling is of interest not only to statisti-cians but also to practitioners in a variety of researchfields such as engineering, economics, design of experi-ments, and operational research. The evaluation of theuncertainty associated with analysis outcomes is nowwidely recognized as an important part of any model-ing effort. A number of approaches to such evaluationare in use, including differential analysis (Cacuci, 2003;

*To whom correspondence should be addressed. E-mail:[email protected].

Tamas, 1990; Rabitz et al., 1983; Frank, 1978; Katafygi-otis and Papadimitriou, 1996; Brza kała and Puła, 1996),neural networks (Zhou et al., 2013; Fuggini et al., 2013;Graf et al., 2012; Panakkat and Adeli, 2009; Adeliand Park, 1996), fuzzy set theory and fuzzy variables(Reuter and Moller, 2010; Sadeghi et al., 2010; Anoopet al., 2012; Hsiao et al., 2012), variance decomposi-tion procedures (Li et al., 2001; Rabitz and Ali, 1999),and Monte Carlo (i.e., sampling-based) procedures(Sadeghi et al., 2010; Yuen and Mu, 2011; Hammers-ley and Handscomb, 1964; Rubenstein, 1981; Kalos andWhitlock, 1986; Sobol’, 1994). Another common taskis to build a surrogate model to approximate the orig-inal, complex model, for example, a response surface(Myers et al., 2004; Myers, 1999; Andres, 1997; Sackset al., 1989; Mead and Pike, 1975; Myers, 1971), a sup-port vector regression or a neural network (Novak andLehky, 2006). The surrogate model is based on a setof carefully selected points in the domain of variables;see, for example Dai et al. (2012). Simulation of randomvariables from multivariate distributions is also neededin the generation of random fields (Vorechovsky, 2008).

Conceptually, an analysis can be formally repre-sented by a deterministic function, Z = g(X), whichcan be a computational model or a physical experiment(which is expensive to compute/evaluate) and where Zis the uncertain response variable or a vector of outputs.In this article, Z is considered to be a random variable(vector). The vector X ∈ R

Nvar is considered to be a ran-dom vector of Nvar marginals (input random variablesdescribing uncertainties/randomness).

Other than the multivariate normal distribution, fewrandom vector models are tractable and general, thoughmany multivariate distributions are well documented(Johnson, 1987). A review of the available techniquesfor the simulation of (generally non-Gaussian) corre-lated vectors are listed in a paper by Vorechovsky andNovak (2009). The information on the random vectorX is therefore usually limited to marginal probabilitydensity functions (PDFs), fi (x) (i = 1, . . . , Nvar), and a

C© 2014 Computer-Aided Civil and Infrastructure Engineering.DOI: 10.1111/mice.12088

Hierarchical refinement of Latin Hypercube Samples 395

correlation matrix, T (a symmetric squared matrix of theorder Nvar):

T =

X1

X2...

X Nvar

X1 X2 . . . X Nvar⎛⎜⎜⎜⎜⎜⎜⎝

1 T1,2 · · · T1,Nvar

... 1 · · · ...

......

. . ....

sym. · · · · · · 1

⎞⎟⎟⎟⎟⎟⎟⎠ (1)

Suppose the analytical solution to the transformationg(X) of the input variables is not known. The task isto perform statistical, sensitivity, and possibly reliabilityanalysis of g(X) given the information above. In otherwords, the problem involves the estimation of statisti-cal parameters of response variables and/or theoreticalfailure probability.

Statistical and probabilistic analyses can be viewedas estimations of probabilistic integrals. Given the jointPDF of the input random vector fX (x), and the functiong(X), the estimate of the statistical parameters of g (·) is,in fact, an approximation to the following integral:∫ ∞

−∞. . .

∫ ∞

−∞S [g(X)] fX (X) dX1 dX2 · · · dX Nvar (2)

where the particular form of the function S [g (·)] de-pends on the statistical parameter of interest. To gainthe mean value, S[g (·)] = g (·), higher statistical mo-ments of the response can be obtained by integrat-ing polynomials of g (·). The probability of failure isobtained in a similar manner; S [·] is replaced by theHeaviside function (or indicator function) H [−g(X)],which equals one for a failure event (g < 0) andzero otherwise. In this way, the domain of integra-tion of the joint PDF above is limited to the failuredomain.

Many techniques have been developed in the past toapproximate such integrals. The most prevalent tech-nique is Monte Carlo simulation (MCS). MCS is popu-lar for its simplicity and transparency and is also used asa benchmark for other (specialized) methods. In MonteCarlo-type techniques, the above integrals are numeri-cally estimated using the following procedure: (1) drawNsim realizations of X that share the same probabil-ity of 1/Nsim by using its joint distribution fX (x) (thesamples are schematically illustrated by the columns inFigure 1); (2) compute the same number of output re-alizations of S[g (·)]; and (3) estimate the results asstatistical averages. The disadvantage of crude MCSis that it requires a large number of simulations (i.e.,the repetitive calculation of responses) to deliver sta-tistically significant results. This becomes unfeasiblewhen the analysis requires time-consuming evalua-

tion (in the form of a numerical or physical exper-iment). In this respect, minimization of the numberof simulations is essential. This can be obtained intwo ways: (1) sampling should be highly efficient sothat the desired accuracy level is attained with a min-imal number of simulations and, (2) sampling conver-gence should be easily quantified so that the analysiscan be halted once suitably accurate results have beenobtained.

Since g (·) is expensive to compute (or otherwise eval-uate), it is advantageous to use a more complicatedsampling scheme. The selection of sampling (integra-tion) points should be optimized so as to maximize theamount of important information that can be extractedfrom the output data. Note that when sampling fromrandom vectors, it is important to control correlationsor some other dependence patterns between marginals(copulas, etc.) as many models are sensitive to correla-tions among inputs.

A good choice is one of the “variance reductiontechniques,” a stratified sampling strategy called LatinHypercube Sampling (LHS). LHS was first suggestedby W. J. Conover, whose work was motivated by thetime-consuming nature of simulations connected withthe safety of nuclear power plants. Conover’s originalunpublished report (Conover, 1975) is reproduced asappendix A of Helton and Davis (2002) together witha description of the evolution of LHS as an unpub-lished text by R. L. Iman (1980). LHS was formally pub-lished for the first time in conjunction with Conover’scolleagues (McKay et al., 1979; Iman and Conover,1980).

It has been found that stratification with proportionalallocation never increases variance compared to crudeMCS, and can reduce it. McKay et al. (1979) showedthat such a sample selection reduces the sampling vari-ance of the statistical parameters of g(X) when g(·)is monotone in each of the inputs. Iman and Conover(1980) showed that for an additive model in which allthe inputs have uniform density functions, the varianceof the estimated mean converges at O(N−3

sim) for LHS,which is significantly better than O(N−1

sim), which is ob-tained for crude MCS. Later, Stein (1987) showed thatLHS reduces the variance compared to simple randomsampling (crude MCS) and explained that the amountof variance reduction increases with the degree of ad-ditivity in the random quantities on which the functiong(X) depends.

The LHS strategy has been used by many authors indifferent fields of engineering and with both simple andvery complicated computational models (Duthie et al.,2011; Marano et al., 2011; Hegenderfer and Atamturk-tur, 2012; Chudoba et al., 2013). LHS is especially suit-able for statistical and sensitivity calculations. However,

396 Vorechovsky

sim.: 1 N

X1

swap

...

XNvar

2

Xi

old...

xi, j

Nold+1 Nsimj k

xi, k1

11

1sym.

A= i

i

old sample(fixed ordering)

added sample(ordering to be optimized)

estimated correlation matrixentries are highlightedaffected by a swap

1

Nvar

Fig. 1. Left: Extension of sample size with correlation control over the extended part. Right: Estimated correlation matrix withhighlighted entries affected by a swap.

it is possible to use the method for probabilistic assess-ment if curve fitting is employed. LHS has also beensuccessfully combined with the importance samplingtechnique (Olsson et al., 2003) to minimize the varianceof estimates of failure probability by sampling impor-tance density around the design point (the most proba-ble point on limit state surface g(X) = 0). Optimal cov-erage of a sample space consisting of many variableswith a minimum of samples is also an issue in the theoryof design of experiments, and LHS and related samplingtechniques have their place in that field.

When using LHS, the choice of sample size is a prac-tical difficulty. A small sample size may not give accept-able statistically significant results, while a large sam-ple size may not be feasible for simulations that takehours to compute. Examples of such complicated com-putations include models in computational fluid dynam-ics, crashworthiness models based on the finite elementmethod, and models used in the statistical analysis ofnonlinear fracture mechanics problems (Bazant et al.,2007; Vorechovsky and Sadılek, 2008; among others),see also the numerical example in this article. In manysuch computer analyses it is impossible to determine thesample size needed to provide adequate data for statisti-cal analysis a priori. Therefore, the ability to extend andrefine the design of an experiment may be important. Itis thus desirable to start with a small sample and thenextend (refine) the design (using more samples duringthe estimation of integrals) if it is deemed necessary.One needs, however, a sampling technique in which pi-lot analyses can be followed by an additional sampleset (or sets) without the need to discard the results ofthe previous sample set(s). The problem is depicted inFigure 1, where the extension of the sampling plan isillustrated by empty circles.

Note that in crude MCS the continuous addition ofnew sample subsets to an existing set can be performedwithout any violation of the consistency of the wholesample set. However, if some kind of variance reduc-

tion technique is used (such as LHS), one has to proceedwith special care.

This article describes a simple technique for the sam-ple extension of an LH-sample. The article is basedon an earlier publication by the author Vorechovsky(2006), which saw minor updates in Vorechovsky (2009,2012b). A related extension technique for LHS was de-veloped at the same time by Tong (2006), but it doesnot consider correlated variables. Another related pa-per was produced by Sallaberry et al. (2008), who de-scribed a method for doubling the sample size (by repli-cating the samples) in LHS while employing a formof correlation control over the input random variables.Later, a related technique appeared in the literatureQian (2009) under the name nested Latin hypercube de-sign. A nested LH-design with two layers is defined asbeing a special LH-design that contains a smaller LH-design as a subset. The need to extend the sample sizein LHS (to “inherit an existing LH-sample”) also aroseduring the building of a sparse polynomial chaos expan-sion for stochastic finite element analysis (Blatman andSudret, 2010). Their technique, however, may not yielda true LH-sample with uniformly represented probabil-ities.

The present article starts with a review of standardLHS in Section 2. The development of the refinementtechnique for a single random variable is presentedin Section 3. Section 4 extends the approach to mul-tivariate cases by employing an advanced correlationcontrol algorithm. The rest of the article presents nu-merical convergence studies and a comparison of theperformance of the refinement technique with that ofstandard LHS.

2 STANDARD LHS

LHS is a special type of MCS which uses the stratifica-tion of the theoretical probability distribution functions


x

F x( )

0

ξ

1

X1/N

i

pjπ jπ j-1

i,j-1 i,j

sim

ξ

i,j

i

f x( )i

xi,1 xi,NsimXi

Fig. 2. Sample selection for a single random variable in LHS.

of individual marginal random variables in such a waythat the region of each random variable Xi is dividedinto Nsim contiguous intervals of equal probability, in-dexed by j = 1, . . . , Nsim, in consistency with the corre-sponding distribution function Fi ; see Figure 2. This isachieved by dividing the unit probability interval intoNsim probability intervals of identical length (probabil-ity), 1/Nsim. Each jth interval is bounded by the lowerprobability bound π j−1 and upper probability bound π j ,where

π j = j/Nsim, j = 1, . . . , Nsim (3)

The corresponding interval over the values of variableXi , from which one representative x j must be selected,is bounded (see Figure 2) so that x j ∈ 〈ξi, j−1, ξi, j 〉, where

ξi, j = F−1i (π j ) ,

i = 1, . . . , Nvar,

j = 1, . . . , Nsim(4)

There are several alternative ways of selecting thesample values from these intervals (Vorechovsky andNovak, 2009). One such option is to generate a ran-dom sampling probability for each interval bounded by〈π j−1, π j 〉 and perform an inverse transformation of thedistribution function, such as the one in Equation (4).Another option is to select the mean value from eachinterval (the centroid of the shaded area in Figure 2)as in (Keramat and Kielbasa, 1997; Huntington andLyrintzis, 1998):

xi, j =

∫ ξi, j

ξi, j−1

x fi (x) dx∫ ξi, j

ξi, j−1

fi (x) dx

= Nsim

∫ ξi, j

ξi, j−1

x fi (x) dx (5)

sim.: 1 2 3 4 5 6

X 1 x1,1 x1,2 x1,3 x1,4 x1,5 x1,6

X 2 x2,2 x2,6 x2,4 x2,1 x2,5 x2,3

12

X23456

X15 64321

Sampling plan for two random variables:

swap

Spearman rankcorrelation

beforeswap

0.014

= 0.03

afterswap = ‒

‒

Fig. 3. Illustration of a random trial in the correlation controlalgorithm—a swap of samples j and k of variable X2.

In this article, the most commonly used strategy ispreferred: the median of each interval is selected by tak-ing the following set of sampling probabilities:

p = {p1, p2, . . . , p j , . . . , pNsim

}: p j = j − 0.5

Nsim(6)

These sampling probabilities form a regular grid overthe unit probability region; see the solid circles inFigure 2. They can be used to form an Nvar × Nsim ma-trix P of sampling probabilities Pi, j , where each rowis a permutation of the vector of sampling probabili-ties p. The samples of arbitrary continuously distributedvariable Xi ∼ P (Xi ≤ x) ≡ Fi (x) are selected using theinverse transformation of the probabilities (entries) inmatrix P:

xi, j = F−1i (Pi, j ) ,

i = 1, . . . , Nvar,

j = 1, . . . , Nsim(7)

These samples then form a sampling plan: a matrix ofthe dimensions Nvar × Nsim that contains the values se-lected from X ; see the solid circles in Figure 1 for a di-agram of the sampling plan for four random variablesand four simulations. Another example of a samplingplan but with two random variables and six simulationscan be found in Figure 3.

At the beginning of the process, sampling proba-bilities p from Equation (6) are sorted in ascendingorder. They form rows in P . In LHS it is necessary tominimize the difference between the target correlationmatrix, T, and the actual correlation matrix, A, whichis estimated from the generated sample. The differencebetween these two matrices can be quantified by asuitable matrix norm (see, Vorechovsky and Novak,

398 Vorechovsky

2009). This matrix norm can be minimized by changingthe mutual ordering of the samples in the samplingplan; see the illustration of a swap in Figure 3. Many au-thors use random permutations to diminish undesiredcorrelation between variables, though this method hasbeen shown (Vorechovsky, 2012a) to deliver relativelylarge errors in correlation with small sample sizes.Therefore, Vorechovsky and Novak (2009) proposeda combinatorial optimization algorithm based on simu-lated annealing to control correlations among samplingplan rows. This rank optimization method seems to bethe most efficient technique for exercising correlationcontrol over samples of univariate marginals withfixed values. In the following study it is shown howthe algorithm may also be exploited to optimize theranking of the extended part of the sampling plan; seethe empty circles in Figure 1.

Let us assume that the initial LH-sampling plan al-ready exists and that the task is to add new simula-tions to it so that the aggregated sample will be a trueLH-sample. It is proposed that this be achieved in twosteps: (1) the extension of the sample for each univari-ate marginal variable Xi , i = 1, . . . , Nvar, and (2) the op-timization of the order of the added samples for eachvariable to control the dependence pattern of the wholesample (i.e., to control the spatial distribution of thepoints). These two steps are described in the followingtwo subsections.

3 SAMPLE SIZE EXTENSION ALGORITHM:UNIVARIATE SAMPLING

In this section, a particular design for an aggregatedsample set denoted as HLHS is described and proposed.The abbreviation stands for hierarchical Latin Hyper-cube Sampling. In this method, the previously selectedsampling probabilities from Equation (6) constitute theparent subset, pold. Its child subset, padd, is constructedin such a manner that each sampling probability ofpold will “generate” t offspring sampling probabilities.The additional subsets themselves are not LH-samples.However, if such a subset is combined with the previoussubset, one obtains an exact LH-sample set, ptot. Thesize of the preceding LH-sample can be arbitrary.

The extension of the sample for each random variablecan be performed for arbitrary distribution Fi becausethis distribution function transforms the desired densityinto a uniform distribution over the interval 〈0, 1〉. Sincethe property of the one-dimensional uniformity of sam-pling probabilities is going to be preserved in HLHS,the problem is reduced to finding a regular grid over theunit probability interval 〈0, 1〉. If the aggregated sam-ple is truly an LH-sample with uniformly distributed

sampling probabilities, the extended sample size mustbe obtained either by (1) replicating the same samplingprobabilities or by (2) adding new sampling probabili-ties in such a way that the aggregated vector of samplingprobabilities will form a regular grid. The second optionhas been selected for HLHS because it is argued thatdesigns should be “noncollapsing.” When one of the in-put variables has (almost) no influence on the black-box function g(·), two design points that differ only inthis parameter will “collapse,” that is, they can be con-sidered as being the same sampling point that is evalu-ated twice. This is not a desirable situation. Moreover,as shown by Hansen et al. (2012), a replicated sample(a sample obtained with identical sampling points thathave been mutually reordered) may not bring any addi-tional information. Therefore, two design points shouldnot share any coordinate values when it is not knowna priori which parameters are important. The proposedHLHS method fulfills this requirement.

Let us use the following notation:

ptot = pold ∪ padd (8)

where pold is the existing LH-sample with Nold simula-tions, padd is the extension of the LH-sample with Nadd

simulations, and ptot is the aggregated vector of sam-pling probabilities with Nsim simulations:

Nsim = Nold + Nadd (9)

Let t denote the refinement factor, where t is a positiveeven integer. The added sample size is

Nadd = t Nold (10)

so that

Nsim = Nold + Nadd = (t + 1)Nold (11)

The smallest possible value of t is two, which means thatdouble the already sampled values are added to the cur-rent set; see Figure 4 left. The total sample size is thenNsim = 3Nold. Since the sample can be refined repeat-edly (hierarchically), the slowest addition (t = 2) leadsto the following sample size after r extensions:

Nsim = 3r Nold (12)

In order to achieve the uniform distribution of sam-pling probabilities in p ≡ ptot given in Equation (6), thesampling probabilities that were already used in set pold

must be ignored. In particular, to obtain padd, all indicesj = 1, . . . , Nsim that follow the equality

( j − 1)(mod(t + 1)) = t/2, (13)

must be ignored as they are already present inpold. Here, pold is the modulo operation; a(modb) isthe remainder of the Euclidean division of a by b.When, for example, t = 2, the following indices are


Total number ofsimulations N

Total number ofsimulations N

0 0.2 0.4 0.6 0.8 1

1

3

9

27probability bounds π

Sampling probability ( )p F x=

sim

8

0 0.2 0.4 0.6 0.8 1

2

10

Sampling probability p F x( )= i

i

2

6

18

addAdditional number ofsimulations N simadd

Additional number ofsimulations N

add

old set padded set p

old

Fig. 4. Hierarchical Latin Hypercube Sample. Left: Hierarchy of three refinements (t = 2) of an initial design with one sample.Right: One step in the extension of sample size with the refinement factor t = 4 and an initial design with two samples.

elpmasdloehtlauqetessiyarra’p’nixedni//;dloN=xditni sizefor( int j=1 ; j <= Nsim ; j++ ){ //loop over all indices

if( (j-1)%(t+1) == t/2 ) continue; //skip this ’old’ sampling probabilityp[idx] = (double) (j-0.5) / Nsim; //new sampling probabilityidx++; //increase index by one

}

Fig. 5. Fragment of C code responsible for computation of the additional sampling probabilities padd.

skipped: j = 2, 5, 8, 11, . . . while the remaining indicesj = 1; 3, 4; 6, 7; 9, 10; . . . are used to obtain padd. Thefragment of C code in Figure 5 shows a cycle that fillsthe vector of sampling probabilities ptot with the addi-tional part padd.

In order to illustrate the regular distribution of thesampling probabilities, Figure 4 (left) shows the situa-tion for an initial design with just one simulation (N0 =1) and the refinement factor t = 2. Figure 4 (right) il-lustrates the situation for Nold = 2 and t = 4 (four off-springs). This can be advantageous in cases when, afterperforming Nold analyses, the analyst plans to extend thesample size more rapidly than just by doubling the pre-vious number. Or, for example, quadrupling Nold maybe more desirable than doubling it and immediately af-ter that doubling Nsim again.

4 SAMPLE SIZE EXTENSION ALGORITHM:MULTIVARIATE SAMPLING AND CONTROL

OVER DEPENDENCE

The proposed concept of the HLHS method for a sin-gle variable can be easily extended to a multivariate set-ting. Suppose the problem to be analyzed via samplingfeatures Nvar variables. The initial LHS design, withNold sampling probabilities pold, has already been per-formed (sample ordering has already been optimizedto conform to the target dependence structure). Sup-pose also that this design has already been utilized

and the Nold results with function g(X) are obtained.The points in the existing design must therefore remainunchanged.

Now, the new sampling probabilities padd are avail-able via the application of the refinement of thegrid of sampling probabilities (see the preceding sec-tion). These new sampling probabilities will be used inEquation (7) to obtain the additional sample. The onlyproblem is thus to achieve the optimal reordering of theadditional sample set for each random variable (row) insuch a way that the aggregated sample matches the tar-get multivariate structure as much as possible. It is pro-posed that the optimal pairing (dislocation of the pointsin Nvar-dimensional space) be achieved by applicationof the above-mentioned combinatorial optimization al-gorithm described in Vorechovsky and Novak (2009).

In fact, the algorithm performs the same kind of job asin the preceding design with Nold points. The only differ-ence is that mutual ordering can only be changed for thenew sampling points, that is, the points with the indicesNold < j, k ≤ Nsim; see Figure 1. This can be achievedvery easily in the above-mentioned algorithm by speci-fying the lower bounds on the indices j, k for swappedvalues.

Of course, when optimizing the mutual ordering ofthe additional sample set, the objective function (e.g.,the correlation error) also takes into account the pre-ceding, old sample set. In this way, any possible errors incorrelation that may have appeared in the old (smaller)design are automatically reduced because the algorithm

400 Vorechovsky

for correlation control compensates for them while op-timizing the ordering of the new sampling probabilities.

A valuable aspect of this approach is that themarginal distributions of the original random vectors re-main intact, and the algorithm merely provides a keyto re-ordering the elements of the original random vec-tors. In this sense, the proposed algorithm is a truly“distribution-free” algorithm, as it does not depend onthe distribution functions featured in the problem.

Let us focus on the common situation case in whichthe correlation error is the objective function to be min-imized and the correlation estimator is either Pearsonlinear or Spearman’s rank correlation coefficient. Inboth cases, correlation between a pair of variables, Xand Y , represented by vectors of Nsim values x j and y j ,can be calculated as the dot product of two vectors. Thiscan be shown using the linear Pearson’s correlation;computation of the Spearman correlation takes place inan identical manner, except that the sample set is re-placed by integer ranks; see, for example, Vorechovsky(2012a). The linear Pearson’s correlation estimator forthe data representing a pair of random variables is givenby:

ρxy =

Nsim∑j=1

(x j − x) (y j − y)√Nsim∑j=1

(x j − x)2

√Nsim∑j=1

(y j − y)2

(14)

For the correlation control algorithm, the values can betemporarily standardized by subtracting the mean valueand dividing this by the standard deviation. After that,the correlation coefficient reads:

ρxy =

Nsim∑j=1

x j y j

(Nsim − 1)(15)

If all the values in the sample sets are further dividedby

√Nsim − 1, the correlation can be computed simply

as the dot product:

ρxy =Nsim∑j=1

x j y j =Nold∑j=1

x j y j

︸︷︷︸fixed ordering

+Nsim∑

j=Nold+1

x j y j

︸︷︷︸ordering to be optimized

(16)

Any swap of a pair of values with indices j, k of anyvariable i only influences the Nvar − 1 correlations inthe estimated correlation matrix A because the affectedvariable i has correlations with Nvar − 1 variables; seeFigure 1 (right). To update each of these Nvar − 1 corre-lation coefficients, one multiplication and three subtrac-tions must be computed:

ρupdxy = ρxy − x j y j − xk yk + x j yk + xk y j

= ρxy − (x j − xk) (yk − y j ) (17)

Therefore, the actual number of simulations does notinfluence the speed of the algorithm and the fact thatpart of the sample is fixed is not a problem.

Once the ordering of the additional part of the sampleis optimized, the resulting sampling plan uses the samesampling probabilities as a sample that would be simu-lated at Nsim by classical LHS, that is, in one step. Theonly source of different performance may lie in the factthat the optimization of mutual ordering has been per-formed for the two subsamples successively. It can beexpected that, the performance of the aggregated sam-ple set determined during the estimation of the statisti-cal characteristics of the studied function g(), can onlybe equally as good as in standard LHS at the same sam-ple size. It can be expected that HLHS only providesjust slightly worse results than LHS when the samplesize Nold is extremely small. This is because the corre-lation errors (and also other measures, such as spatialuniformity) decrease with increasing sample size.

The following paragraph is focused on the case inwhich the shuffling algorithm controls correlations. Asshown by Vorechovsky and Novak (2009) and also inVorechovsky (2012c), the subsequent addition of sam-ples has no negative impact on the quality of the corre-lation structure of the resulting sample set. This is alsotrue for random ordering (see Vorechovsky, 2012a). Itshould be noted that when Nsim ≤ Nvar, which can eas-ily happen especially in initial designs, the estimatedcorrelation matrix, A, is singular and positive semidef-inite. The associated correlation error drastically de-creases when Nsim exceeds Nvar. In these situations, theaverage error in correlation is approximately N−5/2

sim Nvar

with the coefficient of variation 1/Nvar (Vorechovsky,2011). More information on this issue can be found inVorechovsky (2012c).

This potential error, a problem stemming from thelower flexibility available due to successive sampling,is visualized in Figure 6. The figure shows the evolu-tion of a bivariate sample selection in HLHS with uni-formly distributed variables (or sampling probabilities).The left-hand side shows the selection with uncorre-lated variables (T = 0), while the middle column hasT = −0.9. Both cases start with the initial design featur-ing only one sampling point, and the extensions doublethe old sample size. During the first extension, there areonly two ways to pair the two new coordinates, leadingeither to perfect positive or negative dependence. Dur-ing extension no. 2, this dependency is efficiently sup-pressed by balancing the correlation. However, if thecorrelation control algorithm would have the freedom


0 1

0

1

0

1

0

1

0

1

0

1

0

1

0 1

Correlation = 0

F x( )1F x( )1

xF

)(

2x

F)

(2

xF

)(

2x

F)

(2

xF

)(

2x

F)

(2

Correlation = -0.9

Nsim= 3

Nsim= 9

Nsim= 27

Nsim= 81

Nsim= 243

ngised laitinI1 .o

N noisnetxE2 .o

N noisnetxE3 .o

N n oi snetxE4 .o

N noisnetxE5 .o

N noisnetxE

A Ecriteri

udze- glaison

0 1F x( )1

Fig. 6. Evolution of bivariate sampling probabilities (orsamples of uniform variables) with N0 = 1, t = 2. Solid circlesrepresent the old sample set and empty circles represent the

additional points. Left column: Statistically independent case.Middle column: Negatively correlated variables. Right

column: Audze-Eglais criterion (potential energy).

to pair all the sampling probabilities, the result wouldbe slightly better. This problem disappears very quicklyas the sample size increases. Yet, for small sample sizes,the flexibility provided by HLHS may be offset by asmall decrease in the ability to fulfill the correlation cri-teria.

It can be argued that a good design for a computerexperiment should provide information about all sec-tors of the experimental region. Uneven designs canyield predictors that are very inaccurate in sparsely ob-served parts of the experimental region. It can be seen

that there are several unexplored regions and also clus-ters of points in Figure 6 (left). The reason why thishappens is that the objective function to be minimizedin the pairing phase of the proposed technique is thecorrelation error alone, without other criteria. In or-der to deliver more uniform coverage of the probabil-ity space, another criterion can be used, for example, inlinear combination with the correlation criterion. Thereare several standard criteria (Husslage, 2006; Husslageet al., 2011) that can be employed which are known fromlow-discrepancy sequences, minimax and maximin de-signs, Audze-Eglais design, or space-filling designs. Theextension of the proposed algorithm to these criteria isstraightforward. Figure 6 shows the evolution of a bi-variate sample selection in HLHS with uniformly dis-tributed variables, where the objective function for op-timization (minimization) is the Audze-Eglais criterion(potential energy U) (Bates et al., 2003):

U =Nsim∑j=1

Nsim∑k= j+1

1

L2j,k

, L2j,k =

Nvar∑i=1

(xi, j − xi,k)2 (18)

where L j,k is the distance between points j and k ( j �= k)in Nvar-dimensional space.

5 NUMERICAL EXAMPLES: FUNCTIONSOF RANDOM VECTORS

The following text documents the convergence of esti-mates of various functions with increasing sample size.In particular, the proposed HLHS method is comparedto standard LHS and also to crude MCS. In all three ofthe compared techniques, the same algorithm for cor-relation control is employed. The goal is to show theextent to which the flexibility in the selection of the ini-tial sample size in HLHS influences the performance atthe final sample size Nsim compared to classical LHS ifcalculated also for Nsim.

It is impossible to explore all existing functions. Sev-eral test functions have been selected to represent thepossible functions that may appear in practice. For allthese functions, the ability to estimate the mean valueμZ and standard deviation σZ of the transformed vari-able Z = g(X) is presented.

In MCS the selection of sampling probabilities is ran-dom (it depends on sequences generated by a pseu-dorandom number generator). In LHS and HLHS thesampling probabilities are deterministic. However, in allthree sampling techniques the pairing algorithm basedon simulated annealing depends on a pseudorandomnumber generator. That is why all the results (μZ andσZ ) are calculated repeatedly with different randomnumber generator settings (seeds) and the averages and

402 Vorechovsky

sample standard deviations (sstd) are presented. Thisgives the reader an idea regarding the variability ofthe obtained estimates. Of course, MCS always deliversgreater variance of estimated μZ and σZ than LHS andHLHS, which are variance reduction techniques. Thenumber of runs with identical Nsim, from which the av-erage and sstd are calculated and plotted as error bars,is 100.

The studies with HLHS are performed with variousnumbers of simulations in the initial design (N0) and thedesign is usually hierarchically refined many times untilNsim reaches approximately 6,000. The refinement fac-tor t is always set to the minimum possible value, 2.

5.1 The sum of Gaussian random variables

The distribution of the sum

gsum(X1, X2) = X1 + X2 (19)

of two jointly normally distributed variables, X1 andX2, with Pearson’s correlation ρ, is normal with a meanvalue of μsum = μX1 + μX2 and a variance of σ 2

sum =σ 2

X1+ σ 2

X2+ 2σX1σX2ρ. Two cases are considered regard-

ing Pearson’s correlation, an independent case (ind) anda correlated case (cor) with ρ = −0.9. With no loss ofgenerality, standardized variables are considered so that

μindsum = μcor

sum = 0 (20)

σ indsum =

√2 ≈ 1.414 214 (21)

σ corsum = 1/

√5 ≈ 0.447 214 (22)

5.2 The product of independent Gaussian randomvariables

The PDF of the product

gprod(X1, X2) = X1 X2 (23)

of two independent standard normally distributed vari-ables, X1 and X2, is

fprod (z) = 1π

K0 (|z|) (24)

where Kn(x) is the modified Bessel function of the sec-ond kind. Integrating the above PDF over (−∞,∞)for statistical moments provides the analytical zeromean and unit variance. Note that in a nonstandardizedand correlated case, the mean value μprod = μX1μX2 +ρσX1σX2 and the variance σ 2

prod = μ2X1

σ 2X2

+ μ2X2

σ 2X1

+σ 2

X1σ 2

X2(1 + ρ2) + 2ρμX1μX2σX1σX2 .

5.3 The minima of independent Weibull randomvariables

The distribution of the minima (extremes):

gmin(X1, X2) = min(X1, X2) (25)

of n = 2 independent and identically distributedWeibull variables, X1 and X2, with Weibull modulus mand scale parameter s, is Weibullian, with the shape pa-rameter m (the coefficient of variation of the result re-mains unchanged from that of X) and the scale param-eter sn = s · n−1/m . The mean value of X involves theGamma function �: μX = s � (1 + 1/m). The coefficientof variation reads

covX =√

� (1 + 2/m) /�2 (1 + 1/m) − 1 (26)

The mean value and standard deviation of the trans-formed variable are

μmin = sn�

(1 + 1

m

)= s · n−1/m · �

(1 + 1

m

)(27)

σmin =√

s2n · � (1 + 2/m) − μ2

min (28)

For the selection of scale parameter s = 1 and shapeparameter m = 12 the results are

μmin ≈ 0.904 501, σmin ≈ 0.091 550 (29)

This may be applied, for example, to the strength of aseries system (the weakest link model).

5.4 The sum of the cosines of a pair of independentGaussian random variables

Consider the sum of the cosines of two independentstandard Gaussian variables, X1 and X2

gcos(X1, X2) =Nvar∑i=1

cos(Xi ) (30)

The mean value and variance of this transformationequal μcos = Nvar exp (−1/2) , σ 2

cos = −Nvar exp (−1) +Nvar [1 + exp (−2)] /2. In the case of Nvar = 2, the ap-proximate values are: μcos ≈ 1.213 061, σcos ≈ 0.632 120.

5.5 The sum of the squares of independent Gaussianrandom variables

The distribution of the sum of the squares

gsqr(X) =Nvar∑i=1

X2i (31)


of independent standard Gaussian variables Xi , i =1, . . . , Nvar is an χ -squared distribution with the meanvalue and standard deviation

μsqr = Nvar , σsqr =√

2Nvar (32)

5.6 The sum of the exponentials of independentGaussian random variables

Consider the sum of the exponentials of independentstandard Gaussian variables Xi , i = 1, . . . , Nvar:

gexp(X) =Nvar∑i=1

exp(−X2

i

)(33)

The mean value and standard deviation of the trans-formed variable are

μexp = Nvar

√3/3 ≈ 0.577 35Nvar (34)

σexp =√

Nvar

√√5/5 − 1/3 ≈ 0.337 46

√Nvar (35)

6 RESULTS AND DISCUSSION

In this section, the results obtained with the six test func-tions are discussed. The convergence has been studiedwith (1) just a pair of random variables (Figure 7) and(2) two functions have been selected to show the de-pendence of the convergence on the number of vari-ables Nvar; see Figure 8. In all the plots, the analyticalsolutions to which the results converge are denoted by ahatched line. In most of the plots, the results for HLHSare obtained for the initial sample size N0 = 3 (the lineswith empty circles begin at Nsim = 3).

In both multiplots, the left-hand side presents theconvergence of the estimated mean value and the right-hand plots present estimates of standard deviations.

In all the cases, the performances of LHS and HLHSin estimating the mean value are practically identical.The estimated mean value has almost no scatter (thevariance of the estimate is reduced to practically zero).The plots also show that MCS estimates have relativelylarge variance that reduces with increasing sample size.The average estimates of LHS and HLHS are usuallyslightly biased at very small sample sizes (Nsim � 10).

In Figure 7 (left), the penultimate function gsqr hasbeen evaluated with various initial sample sizes, N0. Theplots of mean value estimates overlap, and therefore itcan be concluded that the number N0 has no impact onthe estimated mean value.

Figure 8 (left) also shows that the variance of MCSin estimating the mean value depends on the number of

random variables (even though the convergence rate isindependent of Nvar).

The situation is slightly different in the case of esti-mated standard deviation. It can be seen that the conver-gence of HLHS is slightly worse than that of LHS at thesame (small) sample size. There are two reasons for this.First, when the sample size is very small, the correlationcoefficient of the initial design in HLHS can be far fromthat which is desired. Even if the HLHS method com-pensates for the errors upon adding new sample points,a correlation coefficient obtained from standard LHSwith a larger sample produced all at once may be bet-ter. This potential error in correlation (the uniformityof coverage of the sampling region with respect to thejoint PDF) may distort the estimates.

The second reason lies in the different method of se-lecting sampling probabilities used in LHS and HLHS,namely the difference between selecting the mean val-ues of intervals (Equation 5) and selecting the medi-ans (Equation 6). As explained and documented byVorechovsky and Novak (2009), selection of the meanvalues yields samples with smaller error in sstd (and thearithmetical averages [aave] match the mean values ex-actly). This method is therefore preferable when usingLHS. In the proposed HLHS method, however, sam-pling is performed using the medians, as then the sam-pling probabilities form a regular grid that can be easilyrefined.

In Figures 7 (right) and 8 (right), it can be seen thatthe average of the estimated standard deviation is usu-ally biased for all three methods when Nsim is small.Moreover, Figure 8 (right) shows that the bias in thestandard deviation estimated by LHS and HLHS de-pends on Nvar. The more random variables there are,the greater the bias.

Figure 7 (top row) compares the convergence for theestimates of a sum of two random variables that are ei-ther independent or highly (negatively) correlated. Itcan be seen that the quality of estimates obtained withLHS and HLHS does not depend on the correlation be-tween the input variables. However, the average esti-mate of MCS is somewhat biased for small Nsim in thecorrelated case.

While efficient sampling strategies, such as optimizedLHS, may allow a target accuracy to be reached with aminimal number of simulation runs, the ability to haltthe simulations when sufficient accuracy has been at-tained also reduces the computational cost. So, giventhe choice of a sampling method for studying a com-puter model, a natural yet difficult question is howmany simulation runs to use. This issue is addressed inSchuyler’s (1997) article on “how many trials is enough”and also in Gilman’s (1968) brief survey on stoppingrules. A detailed discussion on measuring the sampling

404 Vorechovsky

N0 = 10 = 2

= 5= 16

N0N0N0

-1

-0.5

0

0.5

1

0

0.5

1

1.5

2

-0.8

-0.4

0

0.4

0.8

0

0.5

1

1.5

0.8

0.85

0.9

0.95

1

0

0.05

0.1

0.15

0.8

1

1.2

1.4

1.6

1.8

0

0.2

0.4

0.6

0.8

1

0.5

1

1.5

0

0.5

1

1.5

2

0.4

0.5

0.6

0.7

0.8

1 10 100 1000 100000

0.1

0.2

0.3

0.4

0.5

1 10 100 1000 10000

gcos

gsqr

gexp

gsum

gprod

gmin

Nsim Nsim

μ sum

μ pro

dμ m

inμ c

osμ

N/sq

rva

rμ

N/ex

pva

r

σ sum

σ pro

dσ m

inσ c

os

independent

negativellycorrelated

σ/N

sqr

var

σ/N

exp

var

Fig. 7. Convergence of estimates to the analytical mean and standard deviation for six functions g of a pair of random variables.Left: The mean value of g. Right: The standard deviation of g.


0.4

0.6

0.8

1

1.2

0.4

0.8

1.2

1.6

0.4

0.6

0.8

1

1.2

0.4

0.8

1.2

1.6

0.4

0.6

0.8

1

1.2

0.4

0.8

1.2

1.6

0.4

0.5

0.6

0.7

0.8

0

0.1

0.2

0.3

0.4

0.4

0.5

0.6

0.7

0.8

0

0.1

0.2

0.3

0.4

0.4

0.5

0.6

0.7

0.8

1 10 100 1000 100000

0.1

0.2

0.3

0.4

1 10 100 1000 10000

μN/

sqr

var

μN/

exp

var

Nsim Nsim

g exp

g sqrμN/

sqr

var

μN/

sqr

var

μN/

exp

var

μN/

exp

var

σ/N

exp

var

σ/N

exp

var

σ/N

exp

var

σ/N

sqr

var

σ/N

sqr

var

σ/N

sqr

var

N = 2N = 3N = 5N = 10

varvarvarvar

N = 2N = 3N = 5N = 10

varvarvarvar

N = 2N = 10

varvar

Fig. 8. Convergence of estimates to the analytical mean and standard deviation for two selected functions of various numbers ofuncorrelated variables. Left: The mean value of g. Right: The standard deviation of g.

406 Vorechovsky

FconcreteEfG

tf

Force Force

cmodcmax

468516

215

32.2

5

max

cmod

Fig. 9. Three point bending of notched concrete specimens. Left: Geometry. Right: Typical calculated diagram and the definitionof output variables.

convergence in MCS, LHS, and replicated LHS with theaim of assessing the accuracy with which decisions canbe made on when to halt the simulation process can befound in a recent paper by Janssen (2013).

To conclude, the performance of the proposed HLHSmethod is almost identical to that of standard LHS whencarried out with the same number of model evaluations.The small decrease in the accuracy of HLHS is, for prac-tical problems involving time-consuming model evalua-tions, balanced by the great advantage of HLHS, whichis its ability to increase the sample size adaptively basedon current progress.

The fact that the sample size increases relativelyquickly upon the addition of a new sample is theprice paid for the perfect uniformity of marginal vari-able sampling probabilities. If the sample size increasepresents a significant problem, the algorithm can beeasily adjusted to deliver only replications of the sam-pling probabilities combined with the shuffling of mu-tual ranks so as to avoid repetitions of the same pointsin the sample space of all random variables. This canbe solved for uncorrelated cases, and statistical corre-lation can later be induced by using the Nataf (1962)model (exploiting Cholesky transformation or orthogo-nal transformation using eigenvalues of the target cor-relation matrix T).

7 NUMERICAL EXAMPLE:FRACTURE-MECHANICS ANALYSES

The following numerical example is included to illus-trate the application of the suggested technique to a re-alistic problem involving complex analysis. The prob-lem of the initiation and growth of cracks in notchedconcrete specimens loaded in three point bending hasbeen selected. Three material properties are consideredas random variables—tensile strength ft, Young’s mod-ulus E , and fracture energy Gf. The analysis aims at es-

timating the mean value, standard deviation, and alsothe PDF of two response variables, namely the peakforce, Fmax, and the crack mouth opening displacement(CMOD) corresponding to the peak force, cmax. Theshape of the concrete specimens and the inputs and re-sponse variables are depicted in Figure 9. The selectedconcrete specimen shape has been tested experimen-tally (Hoover et al., 2013) as a part of a larger experi-mental setup focused on the size effect phenomenon.

The simulated response of specimens is calculatedby employing nonlinear finite element analysis as im-plemented in the OOFEM software package (Patzak,2000; Patzak and Bittnar, 2001; Patzak and Rypl, 2012).OOFEM is an Object Oriented Finite Element Solverwith an impressive range of options and good docu-mentation. The progressive failure that is crack growthis modeled by employing the isotropic damage modelfor tensile failure (Idm1) implemented in the OOFEMprogram. This isotropic damage model assumes thatstiffness degradation is isotropic, that is, stiffness mod-uli corresponding to different directions decrease pro-portionally and independently of the loading direction.Equivalent strain, which serves as a scalar measure ofthe strain tensor, is defined according to the Mazarsmodel; the definition is based on the norm of the posi-tive part of the strain. Since the growth of damage leadsto softening and induces localization of the dissipativeprocess, attention must be paid to proper regularization.The model is local, and the damage law with exponen-tial softening is adjusted according to the element size(crack-band approach).

The analyses are relatively time consuming, partlydue to the fine mesh used. The band of ligament ele-ments above the notch of width 1.5 mm are squares ofapproximately the same size, and therefore there is arow of 123 finite elements. In the rest of the domain,squared elements of size 4.3 mm are used so that thedepth of the specimen is discretized into 50 finite ele-ments. The same element size is used for the steel blocks


Fig. 10. Convergence of estimated moments (standardized by the values from Table 1) of input and output variables. Left:Arithmetical averages (aave). Right: Sample standard deviations (sstd).

supporting the specimen and also for the block at thetop of it (block dimensions: 27.5 × 17.2 mm). Each anal-ysis takes approximately one hour on a modern PC.

As with the numerical examples presented in Section5, three types of Monte Carlo analysis were performed,namely crude Monte Carlo Sampling (MCS), standardLHS, and the proposed HLHS method. The analyseswith MCS and LHS were performed with all the samplesizes Nsim that appear in progressively extended sam-ple sizes in HLHS: Nsim = 3(i) = 1, 3, 9, 27, 81, 243, 729(i = 0, . . . , 6). Therefore, the least number of simula-tions in total were performed with HLHS, where thetotal sample size equalled 729. The total number of eval-uations of the model performed with LHS and MCSwas the sum of all the sample sizes,

∑3i = 1, 093. Since

the simulations are time consuming, only one run ofall the sets of simulations has been performed for eachtechnique, and so information on the statistical scat-ter of estimated parameters is not available. The pre-sented numerical examples mimic the situation forwhich HLHS is designed: the user progressively addstime-expensive simulations with the intention of keep-ing the total number of model evaluations low. The evo-lution of the target estimated statistical parameters, pos-sibly accompanied with the testing of hypotheses at thetarget significance level, gives a hint as to the samplesize which will be revealed as sufficient.

Table 1 summarizes the input variables that are con-sidered to be random variables, together with the pa-

Table 1Description of input and output variables

Variable Mean Std.Type name and unit value dev. c.o.v. PDF

Input E (GPa) 36.5 5.48 0.15 Normalft (MPa) 4.38 0.88 0.2 NormalGf (N/m) 60 18 0.3 Normal

Output cmax (μm) 107� 17.3� 0.16� Gumbel�

Fmax (kN) 4.43� 0.83� 0.19� Normal�

Stars designate results based on numerical estimations. c.o.v. = coef-ficient of variation.

rameters of the random variables. The table also dis-plays estimates of the statistical moments of the twooutput variables. The distribution functions of the out-put variables are best fits selected from a list of distri-bution functions available in FReET software (Novaket al., 2003; Novak et al., 2014) using the goodness-of-fit(Kolmogorov-Smirnov) test. The three input variableswere considered correlated; the correlations appear inFigure 11.

Figure 10 presents the convergence of estimated sta-tistical parameters of the input [output] variables tothe target [accurate] values. In particular, the figurepresents the convergence of the aave and sstd, bothstandardized by the exact values. It can be seen thatin the case of MCS (bottom row), the estimations varyfor various sample sizes (compared with the variance

408 Vorechovsky

Fig. 11. Top right matrix: Evolution of bivariate scatterplots obtained via HLHS accompanied with the values of Spearman rankorder correlation. Symbol sizes correspond to the numbers of sampling size extensions. Bottom left: Evolution of a bundle of

calculated force (CMOD) diagrams obtained via HLHS. The thick lines correspond to initial simulations

expressed using the error bars in previous examples);the error can easily reach 10%. LHS and HLHS bothyield much better results than MCS. The accuracy forLHS and HLHS is about the same. Reasonable resultsfor the standard deviation can already be obtained forNsim = 81 (the fourth extension of the sample size). Thesample averages of the inputs are identical and exactin these two techniques (symmetrical sampling) but thesstd evolve in a slightly different way. The reason is thatwhile HLHS starts by sampling the mean value (Equa-tion 5), the medians of intervals are used (Equation 6)for the sample size extension in order to guarantee theeven distribution of new sampling probabilities. The re-sults obtained with LHS are gained with sampling themean values of intervals (see also the discussion in Sec-tion 6). Otherwise, the techniques yield almost identicalresults. Note, however, that in LHS exhausting all previ-ous sample sizes until the fourth extension in pilot setswould require Nsim = 1 + 3 + 9 + 27 + 81 = 121 modelevaluations, while in HLHS only 81 evaluations wouldneed to be performed.

The evolving bundle of diagrams calculated withHLHS is also interesting; see bottom left in Figure 11.

60 100 140 180 220 0 2 41 3 5 6 7

0.60.03

rela

tive

freq

uenc

y

0 0

Fig. 12. Evolution of empirical histograms of outputvariables obtained via HLHS.

The initial simulation with the mean values is depictedby a thick line, while the second level adds two thin-ner curves, etc. The scatterplots in Figure 11 show theevolving selection of sampling points for all pairs ofinput and also output variables. This information pro-vides control over input correlations and also serves asan important source of information on (1) the depen-dence patterns between inputs and outputs and also (2)


between the two output variables. For example, it con-firms the intuitively acceptable fact that the CMOD atpeak load, cmax, exhibits perfect negative dependenceon the E modulus. On the other hand, the peak forceFmax is strongly correlated with both tensile strength ft

and fracture energy Gf. This information can also be vi-sualized while extending the sample size, and the sig-nificance of the calculated sensitivities (in the form ofcorrelations) can easily be calculated.

The estimated PDF of the outputs may be impor-tant information which can be gained using empiricalhistograms. The evolution of these histograms is visual-ized in Figure 12. Better information on the tails can beobtained by adding more sampling points. This can bedone easily just by extending the sample size in HLHS.

8 CONCLUSIONS

The article proposes a technique for the sample size ex-tension of an LH-sample in which the new sample pointsare selected based on the locations of the available sam-ple points (without exploiting information from the ob-tained results). It is shown that the proposed adaptivetechnique HLHS yields a result approximately equallyas good as that obtained from standard LHS for thesame number of model evaluations. The computationaladvantage lies in the possibility to begin with a smallsample and extend it if necessary. The method is de-signed for small sample sets (from tens to a maximumof thousands of simulations) that can be extended andmerged together to constitute a consistent sample setand preserve the capability to provide the variance re-duction of estimates at the same time.

It is proposed that the desired correlation or othertypes of multivariate structure description (copulas, dis-crepancy, space-filling criteria) be induced by the previ-ously developed combinatorial optimization technique.

Adaptive refinement can be performed as long assome stopping criteria are met. The criteria for termi-nation can be especially (1) the user’s decision based,for example, on material or computing resources; and(2) the statistical significance of an arbitrary parameter.

The typical application of the proposed HLHS is insituations involving a computer-based model in which itis impossible to find a way (either closed form or numer-ical) of performing the necessary transformation of vari-ables, and where the model is expensive to run in termsof computing resources and time. Examples of applica-tions include simulations of random fields, the design ofphysical or computer experiments, pilot numerical stud-ies of complicated functions of random variables, theprogressive learning of neural networks, etc.

The method is implemented in FReET software(Novak et al., 2003; Novak et al., 2014).

ACKNOWLEDGMENTS

The author acknowledges financial support provided bythe Czech Ministry of Education, Youth and Sports un-der project No. LH12062 and also support provided bythe Czech Science Foundation under project no. GA14-10930S (SPADD). The author thanks his colleagueVaclav Sadılek and former student Frantisek Cernıkfor their help with numerical simulations presented inthe article.

REFERENCES

Adeli, H. & Park, H. S. (1996), Hybrid CPN–neural dy-namics model for discrete optimization of steel struc-tures, Computer-Aided Civil and Infrastructure Engineer-ing, 11(5), 355–66.

Andres, T. (1997), Sampling methods and sensitivity analysisfor large parameter sets, Journal of Statistical Computationand Simulation, 57(1-4), 77–110.

Anoop, M. B., Raghuprasad, B. K. & Balaji Rao, K. (2012), Arefined methodology for durability-based service life esti-mation of reinforced concrete structural elements consider-ing fuzzy and random uncertainties, Computer-Aided Civiland Infrastructure Engineering, 27(3), 170–86.

Bates, S., Sienz, J. & Langley, D. (2003), Formulation of theAudze-Eglais Uniform Latin Hypercube design of experi-ments, Advances in Engineering Software, 34(8), 493–506.

Bazant, Z. P., Pang, S. D., Vorechovsky, M. & Novak, D.(2007), Energetic-statistical size effect simulated by SFEMwith stratified sampling and crack band model, Inter-national Journal for Numerical Methods in Engineering,71(11), 1297–320.

Blatman, G. & Sudret, B. (2010), An adaptive algorithm tobuild up sparse polynomial chaos expansions for stochasticfinite element analysis, Probabilistic Engineering Mechan-ics, 25(2), 183–97.

Brzakała, W. & Puła, W. (1996), A probabilistic analysis offoundation settlements, Computers and Geotechnics, 18(4),291–309.

Cacuci, D. G. (2003), Sensitivity & Uncertainty Analysis, Vol-ume 1: Theory, 1st edn, Chapman and Hall/CRC, BocaRaton, FL.

Chudoba, R., Sadılek, V., Rypl, R. & Vorechovsky, M. (2013),Using python for scientific computing: an efficient and flex-ible evaluation of the statistical characteristics of functionswith multivariate random inputs, Computer Physics Com-munications, 184(2), 414–27.

Conover, W. (1975), On a better method for selecting inputvariables. Unpublished Los Alamos National Laboratoriesmanuscript, reproduced as Appendix A of “Latin Hyper-cube Sampling and the Propagation of Uncertainty in Anal-yses of Complex Systems” by J.C. Helton and F.J. Davis,Sandia National Laboratories report SAND2001-0417, Al-buquerque, NM and Livermore, CA.

410 Vorechovsky

Dai, H., Zhang, H., Wang, W. & Xue, G. (2012), Struc-tural reliability assessment by local approximation of limitstate functions using adaptive Markov chain simulation andsupport vector regression, Computer-Aided Civil and In-frastructure Engineering, 27(9), 676–86.

Duthie, J. C., Unnikrishnan, A. & Waller, S. T. (2011), Influ-ence of demand uncertainty and correlations on traffic pre-dictions and decisions, Computer-Aided Civil and Infras-tructure Engineering 26(1), 16–29.

Frank, P. M. (1978), Introduction to System Sensitivity Theory,Academic Press Inc., State University of New York at StonyBrook, Stony Brook, NY.

Fuggini, C., Chatzi, E. & Zangani, D. (2013), Combining ge-netic algorithms with a meso-scale approach for systemidentification of a smart polymeric textile, Computer-AidedCivil and Infrastructure Engineering, 28(3), 227–45.

Gilman, M. (1968), A brief survey of stopping rules in MonteCarlo simulations, in Proceedings of the Second Conferenceon Applications of Simulations, New York, NY.

Graf, W., Freitag, S., Sickert, J.-U. & Kaliske, M. (2012),Structural analysis with fuzzy data and neural networkbased material description, Computer-Aided Civil and In-frastructure Engineering, 27(9), 640–54.

Hammersley, J. M. & Handscomb, D. C. (1964), Monte CarloMethods, Taylor & Francis, Methuen, London.

Hansen, C. W., Helton, J. C. & Sallaberry, C. J. (2012), Use ofreplicated Latin hypercube sampling to estimate samplingvariance in uncertainty and sensitivity analysis results forthe geologic disposal of radioactive waste, Reliability Engi-neering and System Safety, 107, 139–48.

Hegenderfer, J. & Atamturktur, S. (2012), Prioritiza-tion of code development efforts in partitioned analy-sis, Computer-Aided Civil and Infrastructure Engineering,28(4), 289–306.

Helton, J. C. & Davis, F. J. (2002), Latin Hypercube Samplingand the Propagation of Uncertainty in Analyses of Com-plex Systems, Technical Report, SAND2001-0417, SandiaNational Laboratories Albuquerque, NM and Livermore,CA.

Hoover, C. G., Bazant, Z. P., Vorel, J., Wendner, R. & Hubler,M. H. (2013), Comprehensive concrete fracture tests: de-scription and results, Engineering Fracture Mechanics, 114,92–103.

Hsiao, F.-Y., Wang, S.-H., Wang, W.-C., Wen, C.-P. & Yu,W.-D. (2012), Neuro-fuzzy cost estimation model enhancedby fast messy genetic algorithms for semiconductor hookupconstruction, Computer-Aided Civil and Infrastructure En-gineering, 27(10), 764–81.

Huntington, D. E. & Lyrintzis, C. S. (1998), Improvements toand limitations of Latin Hypercube Sampling, ProbabilisticEngineering Mechanics, 13(4), 245–53.

Husslage, B. (2006), Maximin designs for computer experi-ments, Ph.D. thesis, Tilburg University.

Husslage, B. G. M., Rennen, G., van Dam, E. R. & denHertog, D. (2011), Space-filling Latin hypercube designsfor computer experiments, Optimization and Engineering,12(4), 611–30.

Iman, R. C. & Conover, W. J. (1980), Small sample sensitivityanalysis techniques for computer models with an applica-tion to risk assessment, Communications in Statistics: The-ory and Methods, A9(17), 1749–842.

Janssen, H. (2013), Monte-Carlo based uncertainty analysis:sampling efficiency and sampling convergence, ReliabilityEngineering and System Safety, 109, 123–32.

Johnson, M. E. (1987, February), Multivariate Statistical Sim-ulation: A Guide to Selecting and Generating ContinuousMultivariate Distributions (Wiley Series in Probability andMathematical Statistics), John Wiley & Sons, New York,NY.

Kalos, M. H. & Whitlock, P. A. (1986), Monte Carlo Methods.John Wiley & Sons, New York, NY.

Katafygiotis, L. S. & Papadimitriou, C. (1996), Dynamicresponse variability of structures with uncertain proper-ties, Earthquake Engineering & Structural Dynamics, 25(8),775–93.

Keramat, M. & Kielbasa, R. (1997), Efficient average qual-ity index of estimation of integrated circuits by ModifiedLatin Hypercube Sampling Monte Carlo (MLHSMC), inIEEE International Symposium on Circuits and Systems,Hong Kong, pp. 1–4.

Li, G., Rosenthal, C. & Rabitz, H. (2001), High dimensionalmodel representations, The Journal of Physical ChemistryA, 105(33), 7765–77.

Marano, G. C., Quaranta, G. & Monti, G. (2011), Modified ge-netic algorithm for the dynamic identification of structuralsystems using incomplete measurements, Computer-AidedCivil and Infrastructure Engineering, 26(2), 92–110.

McKay, M. D., Conover, W. J. & Beckman, R. J. (1979), Acomparison of three methods for selecting values of inputvariables in the analysis of output from a computer code,Technometrics 21, 239–45.

Mead, R. & Pike, D. J. (1975), A biometrics invited paper. Areview of response surface methodology from a biometricviewpoint, Biometrics, 31(4), 803.

Myers, R. (1999), Response surface methodology – currentstatus and future directions, Journal of Quality Technology,31(1), 30–44.

Myers, R., Montgomery, D., Vining, G., Borror, C. & Kowal-ski, S. (2004), Response surface methodology: a retrospec-tive and literature survey, Journal of Quality Technology,36(1), 53–77.

Myers, R. H. (1971), Response Surface Methodology, Allynand Bacon, Inc., Boston, MA.

Nataf, A. (1962), Determination des distributions de proba-bilites dont les marges sont donnes, Comptes Rendus deL’Academie des Sciences, 225, 42–3.

Novak, D. & Lehky, D. (2006), ANN inverse analysis basedon stochastic small-sample training set simulation, Engi-neering Applications of Artificial Intelligence, 19(7), 731–40.

Novak, D., Vorechovsky, M. & Rusina, R. (2003), Small-sample probabilistic assessment – FREET software, inA. Der Kiureghian, S. Madanat, and J. M. Pestana(eds.), ICASP’09: International Conference on Applica-tions of Statistics and Probability in Civil Engineering, SanFrancisco, USA. Millpress, Rotterdam, Netherlands, pp.91–6.

Novak, D., Vorechovsky, M. & Teply, B. (2014), FReET: soft-ware for the statistical and reliability analysis of engineeringproblems and FReET-D: degradation module, Advances inEngineering Software, 72, 179–92.

Olsson, A., Sandberg, G. & Dahlblom, O. (2003), On LatinHypercube Sampling for structural reliability analysis,Structural Safety, 25(1), 47–68.

Panakkat, A. & Adeli, H. (2009), Recurrent neural networkfor approximate earthquake time and location predictionusing multiple seismicity indicators, Computer-Aided Civiland Infrastructure Engineering, 24(4), 280–92.


Patzak, B. (2000), OOFEM project home page. Available at:http://www.oofem.org, accessed January 1, 2014.

Patzak, B. & Bittnar, Z. (2001), Design of object orientedfinite element code, Advances in Engineering Software,32(10-11), 759–67.

Patzak, B. & Rypl, D. (2012), Object-oriented, parallel finiteelement framework with dynamic load balancing, Advancesin Engineering Software, 47(1), 35–50.

Qian, P. Z. G. (2009), Nested Latin hypercube designs,Biometrika, 96(4), 957–70.

Rabitz, H. & Ali, A. F. (1999), General foundations of high-dimensional model representations, Journal of Mathemati-cal Chemistry 25(2–3), 197–233.

Rabitz, H., Kramer, M. & Dacol, D. (1983), Sensitivity analy-sis in chemical kinetics, Annual Review of Physical Chem-istry, 34(1), 419–61.

Reuter, U. & Moller, B. (2010), Artificial neural networks forforecasting of fuzzy time series, Computer-Aided Civil andInfrastructure Engineering, 25(5), 363–74.

Rubenstein, R. Y. (1981), Simulation and the Monte CarloMethod, John Wiley & Sons, New York, New Press, Ox-ford.

Sacks, J., Welch, W. J., Mitchell, T. J. & Wynn, H. P. (1989),Design and analysis of computer experiments, StatisticalScience, 4(4), 409–23.

Sadeghi, N., Fayek, A. R. & Pedrycz, W. (2010), FuzzyMonte Carlo simulation and risk assessment in construc-tion, Computer-Aided Civil and Infrastructure Engineering,25(4), 238–52.

Sallaberry, C., Helton, J., & Hora, S. (2008), Extension ofLatin hypercube samples with correlated variables, Relia-bility Engineering and System Safety, 93(7), 1047–59.

Schuyler, J. R. (1997), Monte Carlo stopping rule (Part1 and Part 2). Available at: http://www.maxvalue.com/tip025.htm, accessed January 1, 2014.

Sobol’, I. M. (1994), A Primer for the Monte Carlo Method.CRC Press, Boca Raton, FL.

Stein, M. (1987), Large sample properties of simulations usingLatin Hypercube Sampling, Technometrics, 29(2), 143–51.

Tamas, T. (1990), Sensitivity analysis of complex kineticsystems. Tools and applications, Journal of MathematicalChemistry, 5(3), 203–48.

Tong, C. (2006), Refinement strategies for stratified samplingmethods, Reliability Engineering and System Safety, 91(10–11), 1257–65.

Vorechovsky, M. (2006), Hierarchical Subset Latin Hy-percube Sampling, in S. Vejvoda et al. (eds.), PPK’06:Pravdepodobnost porusovanı konstrukcı, Brno, Czech Re-public, pp. 285–98. Brno University of Technology, Facultyof Civil Engineering and Faculty of Mechanical Engineer-

ing & Ustav aplikovane mechaniky Brno, s.r.o. & Asociacestrojnıch inzenyru & TERIS, a.s., pobocka Brno.

Vorechovsky, M. (2008), Simulation of simply cross corre-lated random fields by series expansion methods, StructuralSafety, 30(4), 337–63.

Vorechovsky, M. (2009), Hierarchical Subset Latin Hyper-cube Sampling for correlated random vectors, in B. Top-ping and Y. Tsompanakis (eds.), Proceedings of the FirstInternational Conference on Soft Computing Technology inCivil, Structural and Environmental Engineering, Madeira,Portugal, Civil-Comp Proceedings: 92, Stirlingshire, Scot-land, pp. 17 pages, CD–ROM. Civil-Comp Press.

Vorechovsky, M. (2011), Correlation in probabilistic simula-tion, in M. H. Faber, J. Kohler, and K. Nishijima (eds.),ICASP’11: Proceedings of the Applications of Statisticsand Probability in Civil Engineering, Zuerich, Switzer-land, Cerra: Taylor & Francis Group, London, pp. 2931–9.

Vorechovsky, M. (2012a), Correlation control in small sam-ple Monte Carlo type simulations II: analysis of es-timation formulas, random correlation and perfect un-correlatedness, Probabilistic Engineering Mechanics, 29,105–20.

Vorechovsky, M. (2012b), Extension of sample size in Latinhypercube sampling–methodology and software, in A.Strauss, D. Frangopol, and K. Bergmeister (eds.), Proceed-ings of the 3rd Symposium on Life-Cycle and Sustainabilityof Civil Infrastructure Systems, Vienna, Austria, IALCCE,Vienna, BOKU University, Vienna: CRC Press/Balkema,pp. 2403–10.

Vorechovsky, M. (2012c), Optimal singular correlation matri-ces estimated when sample size is less than the number ofrandom variables, Probabilistic Engineering Mechanics, 30,104–16.

Vorechovsky, M. & Novak, D. (2009), Correlation control insmall sample Monte Carlo type simulations I: a simulatedannealing approach, Probabilistic Engineering Mechanics,24(3), 452–62.

Vorechovsky, M. & Sadılek, V. (2008), Computational mod-eling of size effects in concrete specimens under uniax-ial tension, International Journal of Fracture, 154(1–2), 27–49.

Yuen, K.-V. & Mu, H.-Q. (2011), Peak ground acceleration es-timation by linear and nonlinear models with reduced orderMonte Carlo simulation, Computer-Aided Civil and Infras-tructure Engineering, 26(1), 30–47.

Zhou, L., Yan, G. & Ou, J. (2013), Response surface methodbased on radial basis functions for modeling large-scalestructures in model updating, Computer-Aided Civil and In-frastructure Engineering, 28(3), 210–26.

Documents

Hierarchical Refinement of Latin Hypercube Samplestechniques,” a stratiﬁed sampling strategy called Latin Hypercube Sampling (LHS). LHS was ﬁrst suggested by W. J. Conover, whose