Engineering Risk Assessment with Subset Simulation (Au/Engineering) || Subset Simulation

5Subset Simulation

In this chapter we discuss the Subset Simulation method for estimating the complementary

cumulative distribution function (CCDF) of a response quantity of interest of a system sub-

jected to uncertainties modeled by standard random variables or processes. The CCDF can

be used directly for estimating the failure probability that the response exceeds a specified

threshold. The samples of the random variables in a Subset Simulation run can also be used

for probabilistic failure analysis, which will be discussed in Chapter 6.

The idea behind Subset Simulation is quite simple and makes use of fundamental probability

logic, namely, conditional probability. The probability of a rare event is equal to the probability

that it happens given that a not-so-rare event has happened, multiplied by the probability of

the not-so-rare event. There are many ways in which one can make use of this idea to create

a variance reduction algorithm. For concreteness and instructional purpose we start with

the “standard algorithm” through which one can start building experience with the method.

Readers interested in an exposition of ideas of the method may refer to Section 2.7. The

standard algorithm is (hopefully) simple to implement. It has been found to be robust to

applications but still competitive in terms of efficiency. Details of the algorithm, issues, and

variants shall be discussed later after some familiarization. The method is not “foolproof.”

Despite the mathematical arguments that justify it, human intelligence and experience are still

needed to judge the validity of the simulation results. This seems to be inevitable for advanced

Monte Carlo methods that claim to be more efficient than Direct Monte Carlo.

5.1 Standard Algorithm

Let X = [X1,… , Xn] be the set of random variables that completely determines the scalar

response Y = h(X) whose complementary cumulative distribution function (CCDF) is to be

estimated:

FY (y) = P(Y > y) (5.1)

Without loss of generality, it is assumed that {X1,… , Xn} are independent, that is, with a

joint PDF

q(x) = q1(x1)⋯ qn(xn) (5.2)

Engineering Risk Assessment with Subset Simulation, First Edition. Siu-Kui Au and Yu Wang.

© 2014 John Wiley & Sons Singapore Pte. Ltd. Published 2014 by John Wiley & Sons Singapore Pte. Ltd.

158 Engineering Risk Assessment with Subset Simulation

An efficient method for generating i.i.d. samples and calculating the PDF value qi(xi) for

given xi is assumed to be available (Chapter 3).

Let the “level probability” p0 ∈ (0, 1) and the “number of samples per level” N be chosen

such that (with little loss of generality)

Nc = p0N (Number of chains) (5.3)

Ns = p−10

(Number of samples per chain) (5.4)

are positive integers. The reason behind this constraint will unfold shortly. Note that

NcNs = N (5.5)

As an example, one may take p0 = 0.1 and N = 500 so that Nc = 50 and Ns = 10.

The following is a standard procedure of Subset Simulation that estimates a sequence of

threshold values b corresponding to a set of predefined exceedance probabilities P(Y > b).

The result is an estimate of the CCDF of Y . The procedure starts from simulation level 0

(Direct Monte Carlo), followed by subsequent simulation levels 1,… , m − 1 (MCMC) that

provide information on the CCDF in regimes of decreasing probability. To avoid distrac-

tion, we omit the description of MCMC involved in levels 1 to m − 1. It can be found in

Section 4.5.2.

5.1.1 Simulation Level 0 (Direct Monte Carlo)

1. Generate samples {X(0)

k : k = 1,… , N} according to the parameter PDF q. Calculate the

corresponding values of response {Y (0)

k = h(X(0)

k ) : k = 1,… , N}.

2. Sort {Y (0)

k : k = 1,… , N} in ascending order to give the list {b(0)

k : k = 1,… , N}. The value

b(0)

k gives the estimate of b corresponding to the exceedance probability p(0)

k = P(Y > b)

where

p(0)

k = N − kN

k = 1,… , N (5.6)

Plotting {(b(0)

k , p(0)

k ) : k = 1,… , N − Nc} gives the CCDF estimate of Y with probability

ranging from (1 − N−1) to p0. The regime for probabilities below p0 shall be further

estimated by higher simulation levels.

3. Set

b1 = b(0)N−Nc

(5.7)

Let {X(1)

j0 : j = 1,… , Nc} be the Nc = p0N samples of X corresponding to {b(0)N−Nc+j : j =

1,… , Nc}. These samples are used as “seeds” for generating additional samples conditional

on F1 = {Y > b1} at simulation level 1.

Subset Simulation 159

5.1.2 Simulation Level i = 1,… , m − 1 (MCMC)

1. From each seed X(i)j0 (j = 1,… , Nc) from the last level (i − 1), use MCMC to generate

Ns = 1∕p0 conditional samples {X(i)jk : k = 1,… , Ns} with conditional PDF q(⋅|Fi); see

Section 4.5.2. This gives Nc Markov chains, each with Ns samples. The total number of

samples at Level i is still equal to N because NcNs = N. The seeds are discarded after use.

2. Let Y (i)jk = h(X(i)

jk ), which has been calculated during MCMC. Sort {Y (i)jk : j = 1,… , Nc; k =

1,… , Ns} in ascending order to give the list {b(i)k : k = 1,… , N}. The value b(i)

k gives the

estimate of b corresponding to exceedance probability p(i)k = P(Y > b) where

p(i)k = pi

0

N − kN

k = 1,… , N (5.8)

Plotting {(b(i)k , p(i)

k ) : k = 1,… , N − Nc} gives the CCDF estimate of Y with probability

ranging from pi0(1 − N−1) to pi+1

0. If i = m − 1 (the highest level), the values for k =

N − Nc + 1, N − Nc + 2,… , N should also be plotted to cover the probability range below

pm0

. This is because no further simulation level will be carried out to obtain a better estimate

of this probability regime.

3. Set

bi+1 = b(i)N−Nc

(5.9)

Let {X(i+1)

j0 : j = 1,… , Nc} be the Nc samples of X corresponding to {b(i)N−Nc+j : j =

1,… , Nc}. These samples are used as “seeds” for generating additional samples condi-

tional on Fi+1 = {Y > bi+1} at the next simulation level (i + 1). Omit this step if i = m − 1

(the highest level).

Figure 5.1 gives a graphical illustration of the procedure. Note that in Figure 5.1c, a sample

at Level 1 lies on the failure boundary F1; and it is not a typo. Even if the seeds at the boundary

of F1 from Level 0 have not been kept for Level 1, it is still possible for the next sample at

Level 1 generated from them to be identical when the candidate is rejected during MCMC.

Example 5.1 Sample accountingTo illustrate how the samples are processed, suppose N = 10 and p0 = 0.2 so that Nc = p0N =2, Ns = 1∕p0 = 5 and N − Nc = 8. This means that at each simulation level there are 2 Markov

chains, each with 5 samples, giving a total of 10 samples at the level. At Level 0 suppose the

samples are {X(0)

1,X(0)

2,… ,X(0)

10} whose responses Y are{

Y (0)

k : k = 1,… , 10}= {3, 6, 5, 1, 2, 0, − 3, 4, 22, 12}

Arranging these in ascending order gives{b(0)

k : k = 1,… , 10}= {−3, 0, 1, 2, 3, 4, 5, 6, 12, 22}


Direct Monte Carlo First threshold level

Markov Chain Monte Carlo

Direct Monte Carlo

(a) (b)

(c) (d)

Random variable space )( bYP > 10X

Responsevalue b

1F1b

)( bYP > 10Random variable spaceX

Responsevalue b

Markov Chain Monte Carlo


Responsevalue b

1F1b

2F

2b


Responsevalue b

1F1b

Second threshold level

Figure 5.1 Illustration of Subset Simulation. (a) Direct Monte Carlo; (b) first threshold level; (c)

Markov chain Monte Carlo; (d) second threshold level.

Then b1 = b(0)

8= 6. Since the value 12 corresponds to X(0)

10and the value 22 corresponds

to X(0)

9, the seeds for generating the samples conditional on F1 = {Y > 6} at Level 1 are

X(1)

1,0= X(0)

10and X(1)

2,0= X(0)

9.

At Level 1, one Markov chain will be generated from X(1)

1,0, giving {X(1)

1,1,X(1)

1,2,… ,X(1)

1,5};

and the other chain from X(1)

2,0, giving {X(1)

2,1,X(1)

2,2,… ,X(1)

2,5}. These samples are all conditional

on F1. The seeds X(1)

1,0and X(1)

2,0are then discarded.

5.2 Understanding the Algorithm

In this section we discuss some features and implementation details of the standard algorithm.

5.2.1 Direct Monte Carlo Indispensible

Subset Simulation starts with Direct Monte Carlo, where the samples are generated simply

according to the parameter PDF q without any conditioning. This step is indispensible because

the target probability always contains an unconditional probability term, no matter how one


breaks it into conditional probabilities. The unconditional samples provide information on

how likely the subsequent conditioning events are. This is an interesting philosophical point.

In the current context, knowing the information conditional on an event alone does not allow

one to determine the probability of such event.

As illustrated in Figure 5.1a, the majority of the unconditional samples cover the high

probability (frequent) regime of the CCDF curve and so they provide information for estimation

in such regime. There are only a small number of samples (or none) in the small probability

regime, reflecting the fact that the unconditional samples do not provide much information

there. It is not efficient to use them to explore such a regime, because on average only p0N of

the unconditional samples lie in the region with probability smaller than p0.

The objective of Level 0 is to provide reliable information at the probability level from

1 down to p0 rather than the target level pm0

. For this reason, only the CCDF curve from

probability 1 to p0 is plotted at this stage. The number of samples N should be much less than

that required when one directly estimates the probability at the target level, for otherwise no

advantage is gained from Subset Simulation.

5.2.2 Rare Regime Explored by MCMC

Starting from Level 1, the samples are all conditional on {Y > bi} (i = 1,… , m − 1) and

they are generated by MCMC. The conditional samples at Level i are aimed at providing

information for the CCDF curve for probability between pi0

to pi+10

. If a higher simulation

level is performed, the information about the regime with probability small than pi+10

will be

explored later. At the highest level m − 1, it relies on the relatively small number (Nc = p0N)

of samples to explore the regime with probability less than pm0

.

5.2.3 Stationary Markov Chain from the Start

The threshold level bi is determined from the samples generated at Level (i − 1). The samples

of X with Y > bi are distributed as the parameter PDF and are conditional on Fi = {Y > bi}.

These samples provide “seeds” for generating samples conditional on Fi by means of MCMC.

Since the seeds are already distributed as the target stationary distribution, the same is also true

for the subsequent Markov chain samples. This means that the MCMC samples are identically

distributed as the target conditional PDF q(x|Fi), not just asymptotically. The algorithm does

not have “burn-in” problems and there is no need to discard the first few Markov chain samples

to wait for stationarity. It is stationary right from the start.

5.2.4 Multiple Chains

From Level 1 onwards, the N samples generated by MCMC at the level consist of samples

from Nc Markov chains, each with Ns samples. This “multi-chain” strategy is less vulnerable

to ergodic problems than generating a single chain of N samples from a single seed. When

the region giving significant contribution to the failure probability has different disconnected

regions, all samples generated from a single chain may be confined to the region in which

they are started if the spread of the proposal PDF is not large enough. The multi-chain strategy


allows the samples to be developed from different chains that are possibly from different

disconnected regions. See Section 4.4.3 for the concept of ergodicity and Section 5.4.2 for

related issues in Subset Simulation.

5.2.5 Seeds Discarded

The seeds {X(i)j0 : j = 1,… , Nc} for generating conditional samples at Level i are discarded

after use. This provides some convenience in sample accounting and analysis of the failure

probability estimate. It also reduces the correlation between the samples at different simulation

levels. Of course, the information from the seeds is wasted, although their number is small

compared to the total number of samples. The algorithm can be easily modified to incorporate

the seeds as conditional samples, so that at each level it is only required to generate (N − Nc)

rather than N new conditional samples. This has been conventionally done in the literature.

5.2.6 CCDF Perspective

The algorithm produces an estimate of the CCDF curve FY (b) = P(Y > b) covering the large

to small probability regimes, rather than just a point estimate for a given threshold level. In

each simulation run, the values of b corresponding to different fixed probability levels p(i)k =

pi0(N − k)∕N for i = 0,… , m − 1 and k = 1,… , N are generated. This is the same concept

used in the construction of CCDF estimate based on Direct Monte Carlo samples (Section

2.5.6).

5.2.7 Repeated Samples

Recall from Eq. (5.9) that at Level i, bi+1 is set as the N − Nc = (1 − p0)Nth value of the

ascending list {b(i)k : k = 1,… , N} so that the conditional probability of exceeding bi+1 is

approximately p0. The samples corresponding to {b(i)N−Nc+j : j = 1,… , Nc} are used as seeds

for the next level (i + 1) as it is anticipated that they are conditional on Y > bi+1. In reality, the

latter need not be true. For example, it is not necessary that bi+1 < b(i)N−Nc+1

. This is because

the values in the list {b(i)k : k = 1,… , N} need not be distinct, due to the rejection mechanism

in MCMC.

As an example, suppose N = 10 and at Level 2{b(2)

k : k = 1,… , 10}= {0.5, 1.2, 1.5, 2.3, 2.3, 2.3, 3.4, 5.6, 5.6, 6.1}

Then for p0 = 0.5, N − Nc = (1 − p0)N = 5, and so b3 = b(2)

5= 2.3. The samples accepted for

Level 3 are

{2.3, 3.4, 5.6, 5.6, 6.1}

The first sample in this set, 2.3, is equal to (instead of greater than) b3 =2.3.


At first glance it seems that setting bi+1 to be the maximum value among those smaller than

b(i)N−Nc+1

may ensure bi+1 < b(i)N−Nc+1

. Following the example above, from the set {b(2)

k : k =1,… , 5} = {0.5, 1.2, 1.5, 2.3, 2.3} if we confine ourselves to those less than b(2)

6= 2.3 then

we get {0.5, 1.2, 1.5}. Taking the maximum over this set gives b2 = 1.5. One can check that

this is less than b(2)

6, b(2)

7,… , b(2)

10. The strategy works in this example but it may still break

down when b(i)1

= b(i)2

= … = b(i)(1−p0)N , which can happen when rejection is frequent.

The issue discussed here is relevant from a sample accounting (coding) point of view, but

it makes little difference to the result for a proper simulation run with large N. When there

are a lot of repeated samples the results tend to be inaccurate and one should be concerned

about the ergodicity and the repeatability/representativeness of a single run of results (Section

5.4.2).

5.2.8 Uniform Conditional Probabilities

In the standard algorithm, it is targeted that

P(Y > b1) ≈ P(Y > b2|Y > b1) ≈ ⋯ ≈ P(Y > bm|Y > bm−1) ≈ p0 (5.10)

The same number of samples N is also used for each simulation level. These choices simplify

the algorithm and they are found to give a balance between simplicity, robustness, and effi-

ciency. It is possible to adopt different conditional probabilities and/or a different number of

samples for different simulation levels. An approximate analysis suggests that, in this case,

the choice of the level probabilities and the number of samples generally depends on the

correlation between the MCMC samples, which is unknown a priori.

Example 5.2 Subset simulation, basicConsider estimating the CCDF of Y = X1X2 where X1 and X2 are i.i.d. standard Gaussian.

Subset Simulation is performed with p0 = 0.1 and N = 500. This means that starting from

Level 1, there are Nc = p0N =50 chains, each with Ns = 1∕p0 =10 samples, making up a total

of 500 samples at each level. Suppose the smallest probability of interest is 10−6. This requires

m = 6 levels. The total number of samples in a single run is mN = 6 × 500 = 3000. The

proposal PDF is chosen as uniform distribution centered at the current sample with maximum

step length w = 1.

As a reference, it can be readily shown that the exact solution for b ≥ 0 is given by,

P(Y > b) = 2∫∞

0

Φ(−b

x

)𝜙(x)dx (5.11)

For b < 0, P(Y > b) = 1 − P(Y > |b|), which follows from the symmetry of the distribution

of Y about 0.

The generation of samples in Subset Simulation is illustrated in a sequence of plots in

Figure 5.2. Plot (a) shows the 500 samples (X1, X2) generated at Level 0. These are distributed

as the original parameter PDF, that is, standard Gaussian. The value of b1 is chosen as the

450th smallest value of Y among these samples, which gives b1 = 1.09. The top 50 samples


-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(a) Level 0

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(b)

X1

X2

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(c) Level 1

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(d)

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(e) Level 2

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(f)

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(g) Level 3

-5 -2.5 0 2.5 5-5

-2.5

0

2.5

5

(h)

Figure 5.2 Population of samples at different simulation levels. Solid line – conditional boundary of

the current level; dashed line – conditional boundary of the next level. Plots at the top row (a, c, e, g)

show the samples at the current level. Plots at the bottom row (b, d, f, h) show the samples accepted as

seeds for generating the samples at the next level.

of X are shown in Figure 5.2b, where the dashed line shows the conditional boundary for

Level 1, that is, X1X2 > b1. From each of these samples a Markov chain of 10 samples will

be generated conditional on Level 1, giving a total of 500 samples. The latter are shown in

Figure 5.2c, where the conditional boundary X1X2 > b1 is now shown with a solid line. The

samples shown in Figure 5.2b, which have been used as seeds for Level 1, are discarded and

hence not shown in Figure 5.2c. Again, b2 is chosen as the 450th smallest value of Y among

the samples at Level 1, which gives b2 = 3.30. The top 50 samples of X are shown in Figure

5.2d, where the dashed line shows the conditional boundary for Level 3, that is, X1X2 > b2.

Figures 5.2e to 5.5h can be understood in the same way. The samples for Level 4 to 6 are

omitted here.

Figure 5.3a shows the CCDF estimate resulting from the samples. The dots show the

threshold Levels b1,… , b5. As a characteristic of Monte Carlo simulation, the result differs

from one simulation run to another. Figure 5.3b shows the results from three independent

runs. We shall discuss in the next section how to assess the estimation error using information

from a single run. Nevertheless, viewing the results from a few simulation runs (if affordable)

can help spot-check ergodic problems. Although not shown in the figure, these results are

close to the exact solution. Note that Direct Monte Carlo using the same number of samples

(3000) can only give estimates of the CCDF down to the smallest non-zero probability of

1∕3000 ≈ 3 × 10−4. The result is only reliable (say with 30% c.o.v.) for probabilities down to

10∕3000 ≈ 3 × 10−3.


-5 0 5 10 15 2010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)(a)

-5 0 5 10 15 2010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)

(b)

Figure 5.3 CCDF estimate, Example 5.2. (a) Single run with dots showing intermediate threshold

levels; (b) three independent runs.

Example 5.3 Brownian processLet B(t) be the standard Brownian process. Consider determining the probability that its

maximum value over the time window [0, T] exceeds a given threshold level b > 0, that is,

P(F) = P

(max

0≤s≤TB(s) > b

)(5.12)

To estimate this probability by Monte Carlo simulation, we divide [0, T] into n intervals and

approximate the process in discrete time (Section 3.9):

Bk =√Δt

k∑i=1

Xi k = 1, 2,… , n (5.13)

where Δt = T∕n is the time interval, Bk = B(kΔt) with B0 = 0, and {Xi : i = 1,… , n} are i.i.d.

standard Gaussian random variables. The maximum value of the process is then given by

Y = max0≤k≤n

Bk (5.14)

The failure probability is given by P(Y > b). Strictly speaking, P(Y > b) is only an approxi-

mation of P(max0≤s≤T B(s) > b) due to time discretization, although the difference is negligible

when Δt is sufficiently small.

For this problem an analytical solution is available:

P(F) = P

(max

0≤s≤TB(s) > b

)= 2Φ

(− b√

T

)(5.15)


0 1 2 3 4 5 610

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)

(a)

0 1 2 3 4 5 610

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)

(b)

Figure 5.4 CCDF estimate, Example 5.3. (a) Single run with dots showing intermediate threshold

levels; (b) three independent runs.

where Φ(⋅) denotes the standard Gaussian CDF. This result can be readily obtained as follows.

By the theorem of total probability,

P(B(T) > b) = P(B(T) > b|F)P(F) + P(B(T) > b|F)P(F) (5.16)

Clearly, P(B(T) > b|F) = 0. Also, P(B(T) > b|F) = 1∕2 since once the process hits the level

b before time T it is equally likely to be above or below b at time T. This is often known

as the “reflection principle.” Substituting these findings into Eq. (5.16) and rearranging gives

P(F) = 2P(B(T) > b). Equation (5.15) then follows, since B(T) is Gaussian with zero mean

and standard deviation√

T .

For a duration of T = 1 sec and n = 1000, Figures 5.4a and 5.4b show respectively the

results of a single run and three independent runs, in a manner similar to Figures 5.3a and

5.4b. The parameters are the same as before, that is, p0 = 0.1, N = 500.

5.3 Error Assessment in a Single Run

In this section we present the formulas that allow the error associated with the CCDF estimate

in Subset Simulation to be assessed using information in a single simulation run. A heuristic

argument is given that provides a quick, intuitive justification for their use so that they

can be correctly interpreted in applications. A formal mathematical proof is postponed until

Section 5.5.

Assuming that the MCMC process in Subset Simulation is ergodic (see Section 5.4.2), the

CCDF estimate produced by Subset Simulation is asymptotically unbiased and convergent as

N → ∞. For practical purposes, the c.o.v. of the CCDF estimate at the intermediate probability

levels may be assessed. The c.o.v. depends on the correlation among the MCMC samples within

a given level and across different levels. A direct formula for the c.o.v. that takes the latter

correlation into account has not been derived yet, but the c.o.v. can be approximately bounded.


The c.o.v. of the CCDF estimate at (bi, pi0) (i = 1,… , m) is approximately equal to the c.o.v.

of FY (bi)−1 (recall that {bi} are random). The latter is denoted by 𝛼i. It may be approximately

bounded as

𝛼(i)L < 𝛼i < 𝛼

(i)U (5.17)

where

𝛼(i)L =

(i∑

j=1

𝛿2j

)1∕2

(5.18)

𝛼(i)U =

(i∑

j=1

i∑k=1

𝛿j𝛿k

)1∕2

(5.19)

𝛿2j =

1 − p0

p0N(1 + 𝛾j) (5.20)

Here,

𝛾1 = 0

𝛾i = 2

Ns−1∑k=1

(1 − k

Ns

)𝜌i(k) i = 2,… , m

(5.21)

is a factor accounting for the correlation among the MCMC samples at Level (i − 1); 𝜌i(k) is

the correlation coefficient of samples along a chain at k steps apart, which can be estimated by

𝜌i(k) ≈ 1

p0(1 − p0)

[1

Nc(Ns − k)

[ Nc∑j=1

Ns−k∑r=1

I(

Y (i−1)j,r > bi

)I(

Y (i−1)

j,r+k > bi

)]− p2

0

](5.22)

5.3.1 Heuristic Argument

The error assessment formulas in Eq. (5.17) to Eq. (5.22) can be reasoned heuristically as

follows. Suppose the intermediate threshold levels {bi : i = 1,… , m} were fixed. Let p1 be

the estimate for p1 = P(Y > b1), equal to the fraction of samples at Level 0 with Y exceeding

b1. Similarly, let pj (j = 2,… , m) be the estimate for pj = P(Y > bj|Y > bj−1), equal to the

fraction of conditional samples at Level (j − 1) with Y exceeding bj. The estimate for P(Y > bi)

is then given by

Pi = p1p2 ⋯ pi (5.23)


5.3.1.1 Asymptotic Unbiasedness

We first argue that Pi is asymptotically unbiased. First note that {pj} are unbiased. If they

were also independent then Pi would be unbiased because then its expectation is simply equal

to the product of the expectations of {pj}. In general, {pj} are dependent because the Markov

chains at a given level are started from seeds selected from the last level. As the expectation

of the product of dependent random variables is generally not equal to the product of their

expectations, Pi is biased for finite N. However, it is asymptotically unbiased as N → ∞because the variance of {pj} diminishes in the limit.

5.3.1.2 Lower and Upper Bound c.o.v.

We next examine the c.o.v. of Pi. Take log on Eq. (5.23) and consider differentials. For small

random errors {Δp1,… ,Δpi}, the error ΔPi is given by

ΔPi

Pi≈

Δp1

p1

+Δp2

p2

+⋯ +Δpi

pi(5.24)

The squared c.o.v. of Pi is equal to the variance of ΔPi∕Pi. Similarly the squared c.o.v. of

pj is equal to the variance of Δpj∕pj. The squared c.o.v. of Pi can then be obtained by taking

variance on Eq. (5.24). If {pj} were independent, then the squared c.o.v. of Pi would be equal

to the sum of the variances of {Δpj∕pj}. This leads to the lower bound in Eq. (5.18) where 𝛿jin fact denotes the c.o.v. of pj. The upper bound in Eq. (5.19) corresponds to the other extreme

where {pj} were fully correlated.

5.3.1.3 Conditional Probability c.o.v.

The squared c.o.v. of the estimate for P(Y > b1) is approximately 𝛿21= (1 − p0)∕p0N because

the probability is p0 and the samples at Level 0 are independent, that is, Direct Monte Carlo.

The squared c.o.v. of the estimate for P(Y > bj|Y > bj−1) (j = 2,… , m) is 𝛿2j = (1 + 𝛾j)(1 −

p0)∕p0N, which is similar to 𝛿21

except for the factor 𝛾j that accounts for the correlation among

the MCMC samples at Level (j − 1). The expression for 𝛾j in Eq. (5.21) results from analysis

of the correlation along a stationary Markov chain with Ns samples, which has the same form

as the formula derived in Section 4.4.1. Equation (5.22) is the estimation formula for the

correlation coefficient at lag k, which is calculated based on averaging over the (Ns − k) pairs

of samples along each chain and then averaging over Nc chains at a given simulation level.

5.3.1.4 Heuristics

The argument presented above is (hopefully) simple and intuitive, providing a quick justi-

fication for the assessment formulas in Eq. (5.17) to Eq. (5.22). Section 5.5 gives a formal

mathematical justification for their use. To avoid misunderstanding, however, it is important

to clarify the heuristic nature of the argument used here.

First of all, in the Subset Simulation algorithm the intermediate threshold levels {bi} are not

fixed. Rather they are adaptively chosen as the N(1 − p0)th smallest value of the sample values

of Y at the previous simulation level. The probability estimate pi is therefore exactly equal to


p0 and so its variance is irrelevant (identically zero). On the other hand, if one assumes {bi}

to be fixed, as in the heuristic argument, then pi is now a random quantity and so its variance

is relevant. However, in this case the number of Markov chains at Level i (i ≥ 1), that is, Nc,

is random because the number of samples with Y exceeding the fixed threshold level bi−1 is

random. This means that if the number of samples Ns per chain is fixed, then the total number

of samples N at each level is random. Otherwise, Ns should be adjusted accordingly (and hence

be random) in order to keep the same number N. These complications render the heuristic

arguments inexact. Nevertheless, it still yields approximately the right asymptotic result for

large N because the variability of {bi} and other associated quantities diminishes in the limit.

5.3.2 Efficiency Over Direct Monte Carlo

To get an idea of the variance reduction that can be achieved by Subset Simulation, consider

the c.o.v. at the target probability level

pF = pm0

(5.25)

We shall obtain an expression for the c.o.v. 𝛿 in terms of pF and investigate its growth with

diminishing pF .

The lower bound c.o.v., denoted by 𝛼L, is given by Eq. (5.18):

𝛼2L =

m∑i=1

(1 − p0

p0N

)(1 + 𝛾i) =

(1 − p0

p0N

)(1 + 𝛾L)m (5.26)

where

𝛾L = 1

m

m∑i=1

𝛾i (5.27)

is approximately a constant of m. The upper bound c.o.v., denoted by 𝛼U , is given by Eq.

(5.19):

𝛼2U =

m∑i=1

m∑j=1

(1 − p0

p0N

)(1 + 𝛾i)

1∕2(1 + 𝛾j)1∕2 =

(1 − p0

p0N

)(1 + 𝛾U)m2 (5.28)

where

𝛾U = −1 + 1

m2

m∑i=1

m∑j=1

(1 + 𝛾i)1∕2(1 + 𝛾j)

1∕2 (5.29)

is also approximately a constant of m. Thus, in general we may write

𝛿2 =(

1 − p0

p0NT

)(1 + 𝛾)mr 2 ≤ r ≤ 3 (5.30)


where 𝛾 is roughly a constant of m; and

NT = mN (5.31)

is the total number of samples used. Taking logarithms on both sides of Eq. (5.25), the number

of simulation levels can be written as

m =ln p−1

F

ln p−10

(5.32)

where the choice of the log-base is arbitrary. Substituting Eq. (5.32) into Eq. (5.30), the c.o.v.

at the target probability level pF may be written as

𝛿2 =

[1 − p0

p0

(ln p−1

0

)r

][(ln p−1

F

)r(1 + 𝛾)

NT

]2 ≤ r ≤ 3 (5.33)

As a comparison, the c.o.v. of the Direct Monte Carlo estimator with the same number of

samples NT is given by

𝛿2MCS =

1 − pF

pFNT(5.34)

The scaling of the c.o.v. with small pF has important consequences on application of the

method to rare events. For Direct Monte Carlo, 𝛿2MCS ∝ p−1

F for small pF and so it grows

drastically with diminishing pF . For Subset Simulation, 𝛿2 ∝ (ln p−1F )r (2 ≤ r ≤ 3) increases

at a much slower rate as pF diminishes.

Example 5.4 Example 5.2 revisited with error assessmentConsider Example 5.2 again, where Y = X1X2; X1 and X2 are standard Gaussian. Recall that

Subset Simulation was performed for p0 = 0.1 and N = 500. Figure 5.5 shows the correlation

sequence estimated using the samples at each level in a single run. The result in Figure 5.5a

for Level 0 corresponds to the theoretical result for Direct Monte Carlo where the correlation

is identically zero and there is no need to estimate it. The plot is included only for contrast.

The value of 𝛾i in Eq. (5.21) is shown in each plot. The estimation errors associated with

the correlation at different lags are different because they are estimated by averaging over

a different number of terms. For example, the correlation at lag k = 1 is the average of

50 × 9 = 450 terms, while the correlation at lag k = 9 is the average of only 50 × 1 = 50

terms.

Using these values of {𝛾i}, the estimation error of the CCDF curve can be calculated.

The lower bound error is shown as +/– one c.o.v. error bar at the intermediate threshold

levels in Figure 5.6a, part of which is inherited from Figure 5.3a. The CCDF curve and

its estimation error in this figure are obtained based on a single run, involving a total of

500 × 6 = 3000 samples. These results differ from one run to another, although they can be

considered representative in a proper run with a sufficient number of samples. As a verification,

Figure 5.6b shows the ensemble average of the CCDF curve and the +/– one c.o.v. of b-values,


0 2 4 6 8

0

0.5

1

Level 0, γ1= 0

0 2 4 6 8

0

0.5

1

Level 1, γ2= 1.8

0 2 4 6 8

0

0.5

1

Level 2, γ3= 1.4

0 2

(a) (b) (c)

(d) (e) (f)

4 6 8

0

0.5

1

ρ(k)

Lag k

Level 3, γ4= 1.6

0 2 4 6 8

0

0.5

1

Level 4, γ5= 1.6

0 2 4 6 8

0

0.5

1

Level 5, γ6= 3

Figure 5.5 Correlation sequence estimated from a single run, Example 5.2. (a) – (f) for Levels 0 to 5,

respectively. Result for Level 0 (Direct Monte Carlo) is theoretical and is shown for contrast.

calculated based on 50 independent runs. The error bars in Figure 5.6a from a single run are

consistent with the error intervals in Figure 5.6b. The dashed lines in Figure 5.6b show the

+/– one c.o.v. of Direct Monte Carlo estimates with the same number of samples. With 3000

samples it can only give an estimate for non-zero probabilities down to 1∕3000 ≈ 3 × 10−4.

Note that in a typical application, one only produces the results in Figure 5.6a and not those

in Figure 5.6b.

-5 0 5 10 15 2010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

(a)

b

P(Y

>b

)

-5 0 5 10 15 2010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

(b)

b

P(Y

>b

)

Figure 5.6 Estimation error, Example 5.2. (a) +/– one c.o.v. error bar (lower bound) estimated from a

single run; (b) ensemble mean and +/– one c.o.v. of b values (solid lines), averaging from 50 runs; exact

solution (dotted line); Direct Monte Carlo (dashed line).


0 2 4 6 8

0

0.5

1

0 2 4 6 8

0

0.5

1

0 2 4 6 8

0

0.5

1

0 2 4 6 8

0

0.5

1

0 2 4 6 8

0

0.5

1

0 2 4 6 8

0

0.5

1

= 3

Level 0, γ1= 0 Level 1, γ2

= 0.75 Level 2, γ3= 1.3(a) (b) (c)

(d) (e) (f)

ρ(k)

Lag k

Level 3, γ4= 1.4 Level 4, γ5

= 1.6 Level 5, γ6= 3

Figure 5.7 Correlation sequence estimated from a single run, Example 5.3. (a)–(f) for Levels 0 to 5,

respectively. Plot for Level 0 (Direct Monte Carlo) is theoretical and is shown for contrast.

Example 5.5 Example 5.3 revisited with error assessmentFor Example 5.3 on Brownian process, discussed in Figure 5.4, the correlation sequence and

estimation errors are shown in Figures 5.7 and 5.8, respectively. They are analogous to Figures

5.5 and 5.6.

0 1 2 3

(a) (b)

4 5 610

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)

0 1 2 3 4 5 610

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)

Figure 5.8 Estimation error, Example 5.3. (a) +/– one c.o.v. error bar (lower bound) estimated from a

single run; (b) ensemble mean and +/– one c.o.v. of b values (solid lines), averaging from 50 runs; exact

solution (dotted line); Direct Monte Carlo (dashed line).


5.4 Implementation Issues

5.4.1 Proposal Distribution

In the standard algorithm the candidate is generated using the independent-component algo-

rithm as described in Section 4.5.2. This resolves the curse of dimension and allows MCMC to

be applicable even when the number of random variables is large. In this context, the efficiency

of the method is affected by the type of the one-dimensional proposal PDFs and the spread

around the current sample. Experience shows that the efficiency is relatively insensitive to

the type of the proposal PDF. Gaussian or uniform distribution have been commonly used for

their convenience. The spread may be characterized by the scale parameter w in Table 4.1 of

Section 4.2. Setting w of the same order as the standard deviation of the parameter PDF is

found to give a balance between efficiency and robustness. It may also be set in an adaptive

manner based on sample statistics calculated from the last level. For example, w may be taken

as the sample standard deviation of the p0N seeds from the last level.

5.4.2 Ergodicity

Ergodicity is an issue for simulation Levels 1 and above, which involve MCMC. As discussed

in Section 4.4.3, practically ergodicity is an issue of whether the Markov chain samples

can populate sufficiently well the important regions of the failure domain. If there is some

important region not visited by the samples, its contribution to the failure probability will not

be reflected in the estimate, which will then be significantly biased.

For a Markov chain started at a single point, ergodic problems may arise due to the existence

of disconnected failure regions that are separated by safe regions whose size is large compared

to the spread of the proposal PDF. For the Markov chain to transit from one failure region to

another, the candidate has to lie in the second failure region, but this is unlikely to happen if

the spread of the proposal PDF is small compared to the safe region between the two failure

regions. The situation of disconnected failure regions may arise, for example, in systems with

components connected in series. The above offers a geometric perspective of one potential

problem, although it may not be relevant in high-dimensional space.

Compared to a single chain, the multi-chain strategy adopted in Subset Simulation makes it

less vulnerable to ergodic problems arising from disconnected failure regions. Direct Monte

Carlo at Level 0 provides an ergodic population of i.i.d. samples that explores the frequent

region. The samples at Level 0 that are used as seeds for Level 1 can be expected to populate

F1 sufficiently well, for otherwise the variability of b1 will be large and subsequently the

variability of bm will not be acceptable as the errors accumulate. The seeds can possibly be

distributed among different important parts of F1, which could be disconnected. The Markov

chain initiated in each disconnected region will populate it at least locally, and so the population

of samples from different chains is likely to correctly account for their contribution. Of course

this implicitly assumes that unimportant failure modes at lower levels remain unimportant at

higher levels. Otherwise there may not be enough seeds (if any) at lower levels to develop more

samples to account for their contribution at higher levels. Increasing N helps avoid ergodic

problems in this respect. Failure events that “suddenly” become important at higher levels

without signs at lower levels give occasional “surprises” to simulation runs. These “black

swan events” are further discussed in Section 5.7.


The foregoing discussion suggests that Subset Simulation is likely to produce an ergodic

estimator, but it offers no guarantee. Whether ergodicity problems become an issue depends

on the particular application and the choice of the proposal PDF. Ergodicity and bias are

relevant in any simulation method which tries to conclude “global” information from some

known “local” information, assuming implicitly that the known local information dominates

the problem. For example, importance sampling using checkpoint(s) (Section 2.6.3) implicitly

assumes that the main contribution of failure probability comes from the neighborhood of the

known check points and there are no other regions of significant contribution. Otherwise the

estimate can be significantly biased. In view of this, one should appreciate the ergodic property

of Direct Monte Carlo.

5.4.3 Generalizations

In this subsection we discuss some possible generalizations to the standard algorithm, some

of which have been explored in the literature. These are presented to give an idea of some

possible modifications that can be introduced to the standard algorithm. The generalizations

need not improve the algorithm in terms of efficiency or robustness.

5.4.3.1 Keeping the Seeds

As mentioned in Section 5.2, it is possible to keep the seeds from the last level and use them

as the conditional samples for the current level instead of discarding them. This can be done

by setting the first sample of the Markov chain to be equal to the seed. The advantage is that it

is now only required to generate (Ns − 1) samples per chain, or N − Nc = (1 − p0)N MCMC

samples per level. The total number of samples (function evaluations) is then

NT = N + N(1 − p0)(m − 1) = N[1 + (1 − p0)(m − 1)] (5.35)

instead of mN. This is the strategy that has been used conventionally in the literature.

It should be noted, however, that keeping the seeds increases the correlation between the

samples at different levels and within the same level. This offsets the benefit from the saving

in samples. That is, for the same number of samples (function evaluations) the seed-keeping

strategy may lead to a similar c.o.v. in the failure probability estimate. Artificially uncorrelating

the samples at successive levels by “burn-in” samples does not solve the problem, because it

requires additional function evaluations whose number is equal to the number of seeds. Section

5.4.4 shows that if the c.o.v. formula based on the seed-keeping strategy is not interpreted

carefully, it may give misleading conclusions regarding the choice of p0.

5.4.3.2 More or Fewer Chains

In the standard algorithm, at a given level the number of Markov chains is always Nc = p0Nand the number of samples per chain is always Ns = 1∕p0. This choice stems naturally from

the fact that there are p0N samples from the last level that are conditional on the current level.

It is possible to generate more or fewer chains than p0N.

Here we discuss a simple strategy that assumes Nc = cp0N and Ns = 1∕p0c where c is a

multiplier; Nc and Ns are still integers. Note that NcNs = N and so the total number of samples


per level is still the same. The case of c = 1 corresponds to the standard algorithm. When

c > 1, the number of chains is greater than the number of seeds. In this case, each seed may be

(repeatedly) used for generating c chains, each with 1∕p0c samples. When c < 1, the number

of chains is less than the number of seeds. In this case, each Markov chain may be started with a

seed uniform randomly chosen from the p0N seeds so that their conditional nature is preserved.

Some comments regarding the effect of this generation are in order. Intuitively, having more

chains than p0N (i.e., c > 1) may help distributing the samples more evenly in the failure

region, compared to the standard algorithm. It may appear to give a smaller correlation factor

𝛾j in Eq. (5.21) by shortening the chain. However, in reality the effect is not straightforward

because now the correlation among the chains has increased as they are started with repeated

seeds; and this correlation has not been taken into account in Eq. (5.21). In fact, as the chains

are now shorter, the samples in an overall sense have a higher correlation with the seeds. This

increases the correlation between the samples among successive levels. Effectively the power

r in the c.o.v. formula Eq. (5.33) may increase.

On the other hand, having fewer chains than p0N (i.e., c < 1) may appear to increase the

factor 𝛾i in Eq. (5.21) by lengthening the chain. However, this can reduce the correlation

between successive levels because the samples in an overall sense are now more steps away

from the seeds. Effectively this may reduce the power r in Eq. (5.33) and hence increase

efficiency of the algorithm. Nevertheless, it should be borne in mind that reducing the number

of chains generally increases the risk of ergodic problems.

5.4.3.3 Relaxing Constraints on Level Probability

Recall that the level probability p0 is assumed to be chosen such that Nc = p0N and Ns = 1∕p0

are integers. These constraints are now clear, because Nc and Ns are respectively the number

of chains and the number of samples per chain. Requiring p0N to be an integer means that p0

is limited to the values {1∕N, 2∕N,… , (N − 1)∕N}. On the other hand, requiring 1∕p0 to be

an integer limits p0 to the possible values {1∕2, 1∕3,… , 1∕N}; in particular, p0 ≤ 1∕2.

One strategy to remove these constraints is to generate the MCMC samples always as one

step away from a seed uniform randomly chosen from the p0N seeds from the last level. This

gives as many chains as the number of samples generated at a given level, and each chain has

only one sample (seed excluded). As a large number of chains are started from the seeds, this

strategy may increase significantly the correlation between successive levels.

5.4.4 Level Probability

The level probability p0 controls how fast the simulation level progresses to reach the target

event of interest. A prudent choice trades off between the total number of levels m required to

reach the target event and the variability of intermediate threshold levels {bi}. Here we present

an approximate analysis to yield some insights on the optimal choice of p0 that minimizes the

c.o.v. 𝛿 of the target failure probability estimate at pF = pm0

.

From Eq. (5.33), 𝛿2 depends on p0 only through the first bracketed term, that is,

J(p0) =1 − p0

p0

(ln p−1

0

)r 2 ≤ r ≤ 3 (5.36)


Minimizing J yields the optimal value of p0. Since this term does not depend on pF , 𝛾 , or NT ,

the optimal choice of p0 in this context is invariant to these factors.

Setting the derivative of J to zero gives the following equation after simplification:

r(1 − p0) + ln p0 = 0 (5.37)

This can be rewritten in the standard form xex = y:

(−rp0)e(−rp0) = −re−r (5.38)

A real-valued unique solution of x to the equation xex = y always exists for y ≥ −1∕e and

is given by the zero-th branch of the Lambert W’s function x = W0(y) (Corless et al., 1996). A

plot of W0(y) versus y is shown in Figure 5.9.

In the current case, x = −rp0, y = −re−r ≥ −1∕e for 2 ≤ r ≤ 3 and so a unique solution

always exists. Consequently, the optimal value of p0 is given by

p∗0= −r−1W0(−re−r) (5.39)

To examine the sensitivity of 𝛿2 to p0, Figure 5.10a plots the value of J versus p0 for r =2, 2.25, 2.5, 2.75, 3. For comparison, the value of J for each r is normalized by its minimum

value so that the curves in the figure for different r all have the same minimum value of 1.

The optimal value of p0 for each r is shown with a dot. These values are also plotted in Figure

5.10b. As r increases from 2 to 3, the optimal value of p0 decreases from 0.2 to 0.06. This

trend suggests that when the correlation among different levels is higher, it is better to reduce

p0 and hence the number of chains.

From Figure 5.10a, the value of J(p0) varies over a narrow range when p0 is between 0.06

and 0.2. Thus, choosing p0 between these two values can lead to a reasonable efficiency. This

is worth noting because, in practice, the value of r is not known a priori. Using p0 = 0.2

(optimal for r = 2) when r = 3 only leads to about 17% increase in c.o.v. compared to the

optimal choice; using p0 = 0.06 (optimal for r = 3) when r = 2 leads to about a 13% increase.

-0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0-1

-0.8

-0.6

-0.4

-0.2

0

y

W0(y

)

Figure 5.9 The zero-th branch of the Lambert W function.


0 0.2 0.4 0.6 0.8 11

1.5

2

2.5

3

3.5

4

p0

J(p 0),

nor

mal

ized

(a)

r=2r=3

2 2.2 2.4 2.6 2.8 30

0.05

0.1

0.15

0.2

0.25

r

optim

al p

0

(b)

Figure 5.10 Optimal level probability. (a) J(p0) versus p0; (b) optimal p0 versus r.

A conventional choice in practice is p0 = 0.1. It is optimal when r = 2.55 and it appears to

follow the middle way. Using this value, the increase in c.o.v. compared to the optimal choice

is no more than 5% (occurring when r = 2). This choice is seen to strike a balance between

efficiency and convenience in implementation. It is conceptually convenient, that is, one order

of magnitude decrease in probability per simulation level.

Regardless of the mathematical details, the above analysis is never complete because it

assumes that the correlation factor 𝛾 in Eq. (5.33) is fixed. In reality, it depends on p0; it is

related to r and it can depend on m. The analysis here only serves to provide a guideline and

it should be complemented with simulation experience with the problem at hand.

5.4.4.1 Remark

For instructional purposes, we perform a similar analysis of the optimal choice of p0 when the

seeds from the last level are used as the conditional samples for the current level instead being

discarded, as discussed at the beginning of Section 5.4.3. The result illustrates the importance

of the correlation between the samples at different levels.

When the seeds are kept as the samples for the current level, it is only required to generate

additional N − Nc = (1 − p0)N samples at each level. The total number of samples is given by

Eq. (5.35) (recalled here):

NT = N + N(1 − p0)(m − 1) = N[1 + (1 − p0)(m − 1)] (5.40)

instead of mN. An analogous formula for the squared c.o.v. in Eq. (5.33) at the target probability

pF = pm0

is then given by

𝛿2 =1 − p0

p0NT(1 + 𝛾)mr[1 + (1 − p0)(m − 1)]

=1 − p0

p0NT(1 + 𝛾)

(ln p−1

F

ln p−10

)r [1 + (1 − p0)

(ln p−1

F

ln p−10

− 1

)] 1 ≤ r ≤ 2 (5.41)

where 𝛾 is a correlation factor defined as before.


0 0.2 0.4 0.6 0.8 110

1

102

103

104

pF

=10-1

pF

=10-6

Uncorrelated levels

Δ2

p0

0 0.2 0.4 0.6 0.8 110

0

101

102

103

104

105

106

pF

=10-1

pF

=10-6

Fully correlated levels

Δ2

p0

(a) (b)

Figure 5.11 Squared unit c.o.v. versus level probability according to Eq. (5.42). (a) Uncorrelated levels;

(b) fully correlated levels.

In the current case, the value of p0 that minimizes 𝛿2 for given NT depends on pF . Figure

5.11 shows the variation of the squared unit c.o.v.

Δ2 = 𝛿2NT =1 − p0

p0

(1 + 𝛾)

(ln p−1

F

ln p−10

)r [1 + (1 − p0)

(ln p−1

F

ln p−10

− 1

)]1 ≤ r ≤ 2

(5.42)

The correlation factor has been taken to be 𝛾 = 3. As far as Eq. (5.42) is concerned, this

value is immaterial because it does not affect the optimal value of p0. Figures 5.11a and 5.11b

correspond to the case when the samples among different levels are uncorrelated and fully

correlated, that is, r =1, 2 in Eq. (5.42). In each plot the curves from bottom to top correspond

to the target probabilities pF = 10−1, 10−2,… , 10−6. Figure 5.11a suggests that when the

levels are uncorrelated, the optimal value of p0 increases steadily with diminishing pF from

about 0.5 towards 1. This is very different from the case when the seeds are discarded, where

the optimal value is about 0.2 (see Figure 5.10). On the other hand, Figure 5.11b shows that

when the levels are fully correlated, the optimal value is about 0.05 regardless of pF , which is

similar to the case when the seeds are discarded.

Simulation experiments indicate that the optimal value suggested by Figure 5.11a is far from

reality. The unit c.o.v. often increases significantly for moderate values of p0. Essentially, Eq.

(5.42) fails to reflect the effect of p0 when the seeds are not discarded. This can be explained

as follows. When the seeds are kept, the number of samples generated at each level is only

(1 − p0)N and so it diminishes as p0 increases. According to Eq. (5.42), this allows a larger value

of p0 to be optimal because it reduces the term (1 − p0)(m − 1) = (1 − p0)(ln p−1F ∕ ln p−1

0− 1).

This effect is more pronounced for smaller pF because the number of levels m increases.

Minimizing Eq. (5.42) with r = 1 implicitly assumes that r does not depend on p0. In reality,

as p0 increases, successive simulation levels have a larger number of samples in common

because the seeds collected from the last level are used as the samples for the current level.


This means that when the seeds are kept, r increases significantly with p0, which increases

the estimation variance. As a result, the actual optimal value of p0 is significantly smaller

than that predicted by Eq. (5.42) with r = 1, that is, Figure 5.11a. The values suggested by

Figure 5.11a will be optimal only when the samples at different levels are uncorrelated, which

is not realistic. Figure 5.11b still gives the right picture because it assumes the conservative

upper-bound value r = 2.

In short, Eq. (5.33) can give a reasonable optimal value of p0 because assuming a constant

r does not deviate significantly from reality when the seeds are not kept. In contrast, Eq.

(5.42) fails to do so because when the seeds are kept r increases significantly for large p0.

The findings here indicate that the correlation between levels is important, although it can be

difficult to address in theoretical analysis. Regardless of whether it is reflected in the analysis,

its effect should be borne in mind when interpreting the derived results.

5.5 Analysis of Statistical Properties

In this section we analyze the statistical properties of the CCDF estimate in Subset Simulation,

namely, its mean and variance. This provides a theoretical justification for the error assess-

ment procedure in Section 5.3. The materials in this subsection are rather technical. Readers

interested in applications may skip this section on first reading.

Recall from the algorithm (Section 5.1) that the CCDF estimate of Subset Simulation

consists of the fixed probability levels {p(i)k } and the random threshold levels {b(i)

k }. The latter

varies from one simulation run to another. Strictly speaking, the statistical properties of {b(i)k }

should be assessed. However, these have not been developed yet, possibly because they are

difficult to derive and the results will likely be related to the PDF of Y , which is unknown

in the first place. Here, the statistical properties of the estimate for P(Y > b) for fixed b are

discussed instead.

For a given b, let Pb denote the estimate for P(Y > b) in Subset Simulation. Recall the

intermediate threshold levels {bi} in Eqs. (5.7) and (5.9). According to the algorithm, Pbdepends on the interval that b lies in among {bi}:

Pb =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

1

N

N∑k=1

I(

Y (0)

k > b)

b < b1

pi−10

1

N

Nc∑j=1

Ns∑k=1

I(

Y (i−1)

jk > b)

bi−1 b)

b > bm−1

(5.43)

Here, Y (0)

k = h(X(0)

k ) and {X(0)

k : k = 1,… , N} are the samples at Level 0 (Direct Monte

Carlo). For i = 2,… , m, Y (i−1)

jk = h(X(i−1)

jk ) and X(i−1)

jk (j = 1,… , Nc; k = 1,… , Ns) is the kth

MCMC sample of the jth chain at Level (i − 1), that is, conditional on {Y > bi−1}.


The random nature of {bi} causes some complications in the derivation of the statistical

properties of Pb. For clarification, the context and assumptions of the derivation are described

first, followed by an analysis of the mean and variance.

5.5.1 Random Intervals

According to Eq. (5.43), the formula for Pb depends on the interval that b belongs to, but the

interval is random because it depends on {bi}. This means that for a given b, which interval it

belongs to is itself a random event. In principle, the mean and variance of Pb can be obtained

using the theorem of total probability accounting for all possible intervals that b may lie in.

Specifically, let the intervals be denoted by {Bi : i = 0,… , m − 1} where

B0 = {b ≤ b1}

Bi = {bi bm−1}

i = 1,… , m − 2 (5.44)

For a given b, let Ib denote the index of the interval that b lies in. For example, if b2 < b ≤ b3

then Ib = 2. Clearly Ib is a random variable that depends on {bi}.

To investigate the bias of Pb, its expectation can be expressed using the theorem of total

probability as

E[Pb] =m−1∑j=0

E[Pb|Ib = j]P(Ib = j) (5.45)

Evaluating E[Pb] requires the knowledge of the conditional expectation E[Pb|Ib = j] and

the probability P(Ib = j) for j = 0,… , m − 1. To simplify analysis we consider those b’s such

that it can be considered practically as lying in a particular interval, in the sense that

P(Ib = i) ∼ 1 for some i and P(Ib = j) ∼ 0 for j ≠ i (5.46)

In this case

E[Pb] ∼ E[Pb|Ib = i] (5.47)

and so it is sufficient to consider the conditional expectations individually for different intervals

without knowing the probabilities {P(Ib = j) : j = 0,… , m − 1}. In evaluating the conditional

expectations, the distribution of the generated samples is assumed to be unaffected by the

conditioning because the probability is close to 1. In this context we can show that,

E[Pb|Ib = i] → P(Y > b) N → ∞ (5.48)

and so Pb is asymptotically unbiased.


The variance of Pb can be assessed using the conditional variance formula,

var[Pb] = E{var[Pb|Ib]} + var{E[Pb|Ib]} (5.49)

In the context of Eq. (5.46), var{E[Pb|Ib]} may be neglected compared to E{var[Pb|Ib]}

and so

var[Pb] ∼ E{var[Pb|Ib]} =m−1∑j=0

var[Pb|Ib = j]P(Ib = j) ∼ var[Pb|Ib = i] (5.50)

This again justifies analyzing var[Pb|Ib = i] for i = 0,… , m − 1 as a proxy for var[Pb].

Theoretically, the result obtained in the context here is applicable for those b’s away from

{bi}, say by a few standard deviations which are O(N−1∕2). For those near the {bi}, the result

will generally be a weighted sum of those from the two neighboring intervals.

5.5.2 Random CCDF Values

The variability of the CCDF estimate is closely related to the variability of the CCDF values

{FY (bi)}, where FY is the (unknown) CCDF of Y in Eq. (5.1). They are random because {bi}

are. Recall that b1 is the N(1 − p0)th smallest value of {Y (0)

k : k = 1,… , N} and so it depends

on the samples {X(0)

k : k = 1,… , N} at Level 0. For higher levels, bi (i ≥ 2) depends on the

samples {X(i−1)

jk : j = 1,… , Nc; k = 1,… , Ns} at Level (i − 1). Since {X(i−1)

jk } are generated

from seeds taken from the last level, they depend on the samples at all lower levels. Generally

{bi} are dependent.

For analysis purpose we characterize the randomness of the CCDF values {FY (bi)} as

FY (b1) = p0(1 + 𝜀1) (5.51)

FY (bi) = FY (bi−1)p0(1 + 𝜀i) i = 2,… , m − 1 (5.52)

where 𝜀i (i = 1,… , m − 1) is a random variable that reflects the statistical variability of bi as

an order statistic of the values of Y at Level (i − 1) for given bi−1. Due to the use of seeds for

starting Markov chains in Subset Simulation, {𝜀i} are generally dependent.

Equations (5.51) and (5.52) imply that

FY (bi) = pi0

i∏j=1

(1 + 𝜀j) i = 1,… , m − 1 (5.53)

When the variability of {𝜀i} vanishes as N → ∞, the expectation of the product term in Eq.

(5.53) is asymptotically equal to 1:

E[FY (bi)] ∼ pi0

(5.54)

The standard deviation of the product term can then be viewed as the c.o.v. of FY (bi).


5.5.3 Summary of Results

In the context considered so far, the derivation is based on the following assumptions:

1. For i = 1,… , m − 1, as N → ∞,

E[𝜀i] = O(N−1) (5.55)

var[𝜀i] = O(N−1) (5.56)

These usually hold when the algorithm is implemented properly.

2. At the ith simulation level (i = 1,… , m − 1), the MCMC samples along different chains

are uncorrelated through the indicator function, that is, I(Y (i)j1k1

> b) and I(Y (i)j2k2

> b) are

uncorrelated for all k1, k2 whenever j1 ≠ j2. In reality this assumption does not hold strictly,

because at a given simulation level two Markov chains are generally correlated. This is

because their seeding samples may be identical, due to repeated samples in MCMC. Even

if the seeding samples are distinct, they can still be correlated because they may belong to

the same chain in some lower simulation levels.

Based on the above assumptions, the main theoretical results are outlined as follows. It can

be shown that Pb is asymptotically unbiased as N → ∞. The c.o.v. of Pb depends on which

interval b belongs to. Let 𝛿b be the c.o.v. of Pb. Then

𝛿2b =

var[Pb]

P(Y > b)2∼

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

1 − p(1)

b

p(1)

b Nb < b1

𝛼2i−1

+1 + 𝛾

(i)b

N

(1 − p(i)

b

p(i)b

)bi−1 bm−1

(5.57)

Here, 𝛼i−1 is the c.o.v. of FY (bi−1)−1 where FY is the (unknown) CCDF of Y;

p(i)b = E[P(Y > b|Y > bi−1)] = FY (b)E

[FY (bi−1)−1

](5.58)

and

𝛾(i)b = 2

Ns−1∑k=1

(1 − k

Ns

)𝜌

(i)b (k) (5.59)


is a factor that accounts for the correlation among the MCMC samples at Level (i − 1); 𝜌(i)b (k)

is the correlation coefficient of the indicator function values {I(Y (i−1)

jk > b) : k = 1,… , Ns} at

k steps apart.

Some comments regarding Eq. (5.57) are in order. The c.o.v. depends on which interval

b lies in because the expression for Pb in Eq. (5.43) does. In the first interval {b < b1} it is

simply the c.o.v. of a Direct Monte Carlo estimator. For subsequent intervals {bi−1 < b < bi}

it consists of the variability of FY (bi−1)−1 due to the variability of the conditional threshold

bi−1 and the variability of the conditional probability estimate (i.e., the average of indicator

function values). The latter variability has the same form as the c.o.v. of a Direct Monte Carlo

estimator, except for the term (1 + 𝛾(i)b ) which accounts for the correlation among the MCMC

samples. As the simulation level ascends, the c.o.v. accumulates through 𝛼i.

5.5.3.1 Approximations

It is not easy to determine 𝛼i−1 because neither the CCDF FY nor the c.o.v. of bi−1 is known.

For error assessment purposes, the following series of approximations that can be reasoned

from Eq. (5.53) may be used:

𝛼i−1 = c.o.v. of FY (bi−1)−1 ≈ c.o.v. of FY (bi−1) ≈ var[∑i−1

j=1𝜀j

]1∕2

(5.60)

The variance on the rightmost depends on the correlation among {𝜀j}. It is generally

bounded between two extremes, when the {𝜀j} are uncorrelated and when they are perfectly

correlated: ∑i−1

j=1var[𝜀j] < var

[∑i−1

j=1𝜀j

]<

∑i−1

j=1

∑i−1

k=1var[𝜀j]

1∕2var[𝜀k]1∕2 (5.61)

Again, var[𝜀j] is not known but it can be approximated by the squared c.o.v. of a (conditional)

probability estimator with a target probability of p0 at Level (j − 1), that is,

var[𝜀j] ≈(1 + 𝛾j)

N

(1 − p0

p0

)(5.62)

where 𝛾1 = 0 and 𝛾j (j = 2,… , m − 1) is given by Eq. (5.21). This is consistent with the

relationship between the c.o.v. of the sample quantile and the probability estimate discussed

in Section 2.5.6. The above approximations justify the c.o.v. bounds in Eq. (5.17).

5.5.4 Expectation

Here, we show that Pb is asymptotically unbiased by analyzing its expectation conditional on

different intervals. For simplicity in notation we omit the conditioning on the interval in the

derivation, for example, abbreviating E[Pb|Ib = i] to E[Pb].


First, consider the case when b < b1. Then Pb is given by the top expression in Eq. (5.43).

Recall that Y (0)

k = h(X(0)

k ) where the samples {X(0)

k : k = 1,… , N} are i.i.d. drawn from their

parameter PDF q. Taking expectation on Pb gives

E[Pb] = 1

N

N∑k=1

E[I(

Y (0)

k > b)]

= 1

N× N × E

[I(

Y (0)

1> b

)]= P(Y > b) (5.63)

since {I(Y (0)

k > b) : k = 1,… , N} are i.i.d. with E[I(Y (0)

k > b)] = P(Y > b).

Next, consider the general case when bi−1 < b < bi (i = 2,… , m − 1). Then Pb is given by

the middle expression in Eq. (5.43) where Y (i−1)

jk = h(X(i−1)

jk ). For a given chain j, {X(i−1)

jk : k =1,… , N} are Markov chain samples started from the seed X(i−1)

j0 obtained from Level (i − 2).

For a given bi−1, the Markov chain samples are distributed as q(⋅|Y > bi−1). This implies

E[I(

Y (i−1)

jk > b) |bi−1

]= P(Y > b|Y > bi−1) (5.64)

Taking conditional expectation E[⋅|bi−1] on the middle expression in Eq. (5.43) gives

E[Pb|bi−1] = pi−10

1

N

Nc∑j=1

Ns∑k=1

E[I(

Y (i−1)

jk > b) |bi−1

]= pi−1

0

1

NNcNsP(Y > b|Y > bi−1)

= pi−10

P(Y > b|Y > bi−1)

(5.65)

since NcNs = N. To proceed, note that

P(Y > b|Y > bi−1) =FY (b)

FY (bi−1)= FY (b)p−(i−1)

0

i−1∏j=1

(1 + 𝜀j)−1 (5.66)

after using Eq. (5.53). Substituting Eq. (5.66) into Eq. (5.65),

E[Pb|bi−1] = FY (b)

i−1∏j=1

(1 + 𝜀j)−1 (5.67)

Taking expectation gives

E[Pb] = E{E[Pb|bi−1]} = FY (b)E

[i−1∏j=1

(1 + 𝜀j)−1

](5.68)


To evaluate the expectation, note that for small {𝜀j},

i−1∏j=1

(1 + 𝜀j)−1 ∼

i−1∏j=1

(1 − 𝜀j) ∼ 1 −i−1∑j=1

𝜀j +∑j>k

𝜀j𝜀k (5.69)

Taking expectation and using E[𝜀j] = O(N−1) from Eq. (5.55) gives

E[∏i−1

j=1(1 + 𝜀j)

−1]= 1 + O(N−1) (5.70)

and so Pb is asymptotically unbiased.

The proof for the case of b > bm−1 is similar to that for the general case with i = m − 1

because they share the same expression of Pb.

5.5.5 Variance

First consider b < b1. In this case Pb is just the Direct Monte Carlo estimate with i.i.d. samples,

and so its variance is given by

var[Pb] = 1

NFY (b)[1 − FY (b)] b < b1 (5.71)

The squared c.o.v. of Pb is given by

𝛿2b =

var[Pb]

P(Y > b)2= 1

N

[1 − FY (b)

FY (b)

]b < b1 (5.72)

Next, consider the general case bi−1 b)

(5.73)

p(i)b = 1

N

Nc∑j=1

Ns∑k=1

I(i)jk (5.74)

so that

Pb = pi−10

p(i)b (5.75)

and

var[Pb] = p2(i−1)

0var

[p(i)

b

](5.76)


5.5.5.1 Variance of p(i)b

Using the conditional variance formula,

var[p(i)

b

]= var

{E[p(i)

b |bi−1

]}+ E

{var

[p(i)

b |bi−1

]}(5.77)

For the first term in Eq. (5.77), note that the distribution of {I(i)jk } depends on bi−1. From

Eq. (5.64),

E[I(i)jk |bi−1

]= pb(bi−1) (5.78)

where

pb(bi−1) = P(Y > b|Y > bi−1) =FY (b)

FY (bi−1)(5.79)

is defined to facilitate analysis. The conditional expectation of p(i)b is then given by

E[p(i)

b |bi−1

]= 1

NNcNsE

[I(i)jk |bi−1

]= pb(bi−1) (5.80)

Taking variance gives

var{

E[p(i)

b |bi−1

]}= var[pb(bi−1)] (5.81)

The variance of pb(bi−1) shall be investigated later.

For the second term in Eq. (5.77), note that for a given bi−1, p(i)b is an estimator based on

Nc Markov chains which are assumed to be uncorrelated through the indicator function. The

conditional variance var[p(i)b |bi−1] can then be derived using a similar technique to that in

Section 4.4.1. It is shown at the end of this subsection that the final expression after taking

expectation is given by

E{

var[p(i)

b |bi−1

]}= 1

N

{p(i)

b

[1 − p(i)

b

] (1 + 𝛾

(i)b

)− var[pb(bi−1)]

}(5.82)

where

p(i)b = E[pb(bi−1)] = E[P(Y > b|Y > bi−1)] (5.83)

𝛾(i)b = 2

Ns−1∑k=1

(1 − k

Ns

)𝜌

(i)b (k) (5.84)


is a correlation factor that accounts for the correlation between the Markov chain samples at

Level (i − 1);

𝜌(i)b (k) =

R(i)b (k)

p(i)b

[1 − p(i)

b

] k = 1,… , Ns − 1 (5.85)

is the correlation coefficient at lag k; and

R(i)b (k) = cov

[I(i)11

, I(i)1,1+k

](5.86)

is the covariance at lag k.

Substituting Eqs. (5.81) and (5.82) into Eq. (5.77) and rearranging gives

var[p(i)

b

]=

(1 − 1

N

)var[pb(bi−1)] + 1

Np(i)

b

[1 − p(i)

b

] (1 + 𝛾

(i)b

)(5.87)

and so, using Eq. (5.76),

var[Pb] = p2(i−1)

0

(1 − 1

N

)var[pb(bi−1)] +

p2(i−1)

0

Np(i)

b

[1 − p(i)

b

] (1 + 𝛾

(i)b

)(5.88)

Note that the variance of pb(bi−1) is given by, after substituting Eq. (5.53) into Eq. (5.79) and

taking variance,

var[pb(bi−1)] = FY (b)2p−2(i−1)

0var

[i−1∏j=1

(1 + 𝜀j)−1

](5.89)

Substituting Eq. (5.89) into Eq. (5.88) and dividing by P(Y > b)2, the c.o.v. 𝛿b of Pb is

asymptotically given by

𝛿2b =

var[Pb]

P(Y > b)2∼ var

[i−1∏j=1

(1 + 𝜀j)−1

]+ 1

N

(1 − p(i)

b

p(i)b

)(1 + 𝛾

(i)b

)bi−1 b) ∼ E[Pb] = pi−10

E[p(i)

b

]= pi−1

0p(i)

b (5.91)

Let 𝛼i−1 be the c.o.v. of FY (bi−1)−1. From Eq. (5.53), FY (bi−1)−1 ∼ pi−10

and so

𝛼2i−1

∼ var

[i−1∏j=1

(1 + 𝜀j)−1

](5.92)


Eq. (5.90) can then be written as

𝛿2b ∼ 𝛼2

i−1+ 1

N

(1 − p(i)

b

p(i)b

)(1 + 𝛾

(i)b

)bi−1 bm−1 is a special case of

this result when i = m.

5.5.5.2 Expression for E{var[p(i)b|bi−1]}

Here we derive the expression for E{var[p(i)b |bi−1]} in Eq. (5.82), where p(i)

b is given by Eq.

(5.74) and bi−1 < b < bi.

At Level (i − 1) the MCMC chains are assumed to be uncorrelated through the indicator

function. This means that the inner sums {∑Ns

k=1I(i)jk : j = 1,… , Nc} in Eq. (5.74) are uncorre-

lated and identically distributed. Then

var[p(i)

b |bi−1

]= var

[1

N

Nc∑j=1

( Ns∑k=1

I(i)jk

)|bi−1

]=

Nc

N2var

[ Ns∑k=1

I(i)1k |bi−1

](5.94)

The technique used for evaluating the variance in Eq. (5.94) is similar to that in Section

4.4.1. First express it as a sum of variance and covariance:

var

[ Ns∑k=1

I(i)1k |bi−1

]=

Ns∑k=1

var[I(i)1k |bi−1

]+

Ns∑k1≠k2

cov[I(i)1k1

, I(i)1k2

|bi−1

](5.95)

Note that

var[I(i)1k |bi−1

]= pb(bi−1) − pb(bi−1)2 (5.96)

where pb(bi−1) = E[p(i)b |bi−1] as in Eq. (5.80). The first sum in Eq. (5.95) then becomes

Ns∑k=1

var[I(i)1k |bi−1

]= Ns

[pb(bi−1) − pb(bi−1)2

](5.97)

To evaluate the second sum in Eq. (5.95), we make use of the symmetry of covariance and the

fact that the Markov chain is stationary:

cov[I(i)1k1

, I(i)1k2

|bi−1

]= cov

[I(i)1k2

, I(i)1k1

|bi−1

](5.98)


This implies that the covariance depends only on |k1 − k2|, that is,

cov[I(i)1k1

, I(i)1k2

|bi−1

]= cov

[I(i)11

, I(i)1,1+|k1−k2||bi−1

](5.99)

The second sum in Eq. (5.95) can then be re-structured according to the lag k = |k1 − k2|that goes from 1 to Ns − 1. For |k1 − k2| = 1, there are 2(Ns − 1) identical terms equal

to cov[I(i)11

, I(i)12|bi−1], corresponding to (k1, k2) = (1,2), (2,1), (2,3), (3,2), . . . , (Ns − 1, Ns),

(Ns, Ns − 1). Similarly, for |k1 − k2| = 2, there are 2(Ns − 2) identical terms equal to

cov[I(i)11

, I(i)13|bi−1]. In general, for |k1 − k2| = k (k = 1,… , Ns − 1) there are 2(Ns − k) iden-

tical terms of cov[I(i)11

, I(i)1,1+k|bi−1]. Consequently,

Ns∑k1≠k2

cov[I(i)1k1

, I(i)1k2

|bi−1

]=

Ns−1∑k=1

2(Ns − k)cov[I(i)11

, I(i)1,1+k|bi−1

](5.100)

Substituting Eq. (5.97) and Eq. (5.100) into Eq. (5.95),

var

[ Ns∑k=1

I(i)1k |bi−1

]= Ns

[pb(bi−1) − pb(bi−1)2

]+ 2

Ns−1∑k=1

(Ns − k)cov[I(i)11

, I(i)1,1+k|bi−1

](5.101)

Substituting into Eq. (5.94) and simplifying gives

var[p(i)

b |bi−1

]= 1

N

[pb(bi−1) − pb(bi−1)2

]+ 2

N

Ns−1∑k=1

(1 − k

Ns

)cov

[I(i)11

, I(i)1,1+k|bi−1

](5.102)

Taking expectation,

E{

var[p(i)

b |bi−1

]}= 1

N

{E[pb(bi−1)] − E

[pb(bi−1)2

]}+ 2

N

Ns−1∑k=1

(1 − k

Ns

)cov

[I(i)11

, I(i)1,1+k

](5.103)

The above can be written in a more compact form as follows. First note that

E[pb(bi−1)2

]= E[pb(bi−1)]2 + var[pb(bi−1)] (5.104)

On the other hand, let

R(i)b (k) = cov

[I(i)11

, I(i)1,1+k

](5.105)


be the covariance at lag k as in Eq. (5.86) and

𝜌(i)b (k) =

R(i)b (k)

p(i)b

[1 − p(i)

b

] k = 1,… , Ns − 1 (5.106)

be the correlation coefficient at lag k as in Eq. (5.85); and

p(i)b = E[pb(bi−1)] = E[P(Y > b|Y > bi−1)] (5.107)

as in Eq. (5.83).

Substituting Eqs. (5.104)–(5.106) into Eq. (5.103) and rearranging gives Eq. (5.82):

E{

var[p(i)

b |bi−1

]}= 1

Np(i)

b

[1 − p(i)

b

] (1 + 𝛾

(i)b

)− 1

Nvar[pb(bi−1)] (5.108)

where

𝛾(i)b = 2

Ns−1∑k=1

(1 − k

Ns

)𝜌

(i)b (k) (5.109)

as in Eq. (5.84).

5.6 Auxiliary Response

In addition to the target response, the samples generated during Subset Simulation can also

provide information for estimating the CCDF of other response quantities. This will be benefi-

cial if those quantities have already been computed in the determination of the target response,

or their evaluation does not require significant additional computational effort. Let V = g(X)

be a scalar response quantity that depends on the set of random variables X. Suppose a Subset

Simulation run is performed for the target response Y = h(X). It produces samples {X(i)k }

and response values {Y (i)k } for estimating the CCDF of Y . If the sample values of V are also

calculated, the CCDF of V can also be estimated. Since the MCMC samples are driven by

the exceedance of Y and the quantity V is just computed along the way, we call Y a “driving

response” and V an “auxiliary response.”

Recall the intermediate threshold levels {bi : i = 1,… , m − 1} generated in Subset Simu-

lation corresponding to CCDF values {pi0

: i = 1,… , m − 1}. Define a sequence of events or

“bins” associated with Y:

B0 = {Y ≤ b1}

Bi = {bi < Y ≤ bi+1}, i = 1,… , m − 2

Bm−1 = {Y > bm−1}

(5.110)


It is clear that {Bi : i = 0,… , m − 1} are mutually exclusive and collectively exhaustive events,

and therefore they form a partition of the sample space. It follows from the theorem of total

probability that

P(V > v) =m−1∑i=0

P(V > v|Bi)P(Bi) (5.111)

This equation says that the probability information about V can be obtained based on the

information about Y . Note that the bins {Bi} are random because they depend on {bi}. The

bin probabilities may be approximated by

P(B0) ≈ P0 = 1 − p0

P(Bi) ≈ Pi = pi0− pi+1

0, i = 1,… , m − 2

P(Bm−1) ≈ Pm−1 = pm−10

(5.112)

It can be easily checked that∑m−1

i=0 Pi = 1.

The conditional probability P(V > v|Bi) can be estimated using the samples in the bin Bi.

Let Mi denote the number of samples in Bi:

Mi = (1 − p0)N, i = 0,… , m − 2

Mm−1 = N(5.113)

For a given i, let {Xik : k = 1,… , Mi} denote the samples of X conditional on Bi. Note that

Xik ≠ X(i)k because Bi ≠ Fi. The conditional probability P(V > v|Bi) can be estimated as

P(V > v|Bi) ≈ Qi =1

Mi

Mi∑k=1

I(Vik > v) (5.114)

where Vik = g(Xik) is the sample value of V . Substituting Eqs. (5.112) and (5.114) into Eq.

(5.111), P(V > v) can be estimated by

P(V > v) ≈ Pv =m−1∑i=0

QiPi (5.115)

Plotting (v, Pv) for different values of v gives the CCDF estimate of V .

Example 5.6 Illustration of auxiliary responseAs an illustration, Figure 5.12 shows a plot of the samples of V versus Y . Here, N = 10,

p0 = 0.2, and m = 4 are used in Subset Simulation for Y . This gives

P0 = (1 − p0) = 0.8, P1 = p0(1 − p0) = 0.16, P2 = p20(1 − p0) = 0.032, P3 = p3

0= 0.008


Y

1b 2b 3b0B ]( ]( ](

1B2B 3B

V

v

Figure 5.12 Illustration of auxiliary response CCDF calculation.

Check that P0 + P1 + P2 + P3 = 1. On the other hand,

M0 = M1 = M2 = (1 − p0)N = 8

M3 = N = 10

That is, B0, B1, B2 each contain 8 samples and B3 contains 10 samples.

If we count the number of samples in each bin whose V exceeds the threshold level v in

Figure 5.12, there is only one sample in B0, no sample in B1, 4 samples in B2, and 5 samples

in B3. Thus,

Q0 = 1∕8, Q1 = 0∕8, Q2 = 4∕8, Q3 = 5∕10

Using Eq. (5.115),

Pv =m−1∑i=0

QiPi =(

1

8

)(0.8) +

(0

8

)(0.16) +

(4

8

)(0.032) +

(5

10

)(0.008) = 0.12

5.6.1 Statistical Properties

Assuming that the Subset Simulation procedure for Y is asymptotically unbiased and conver-

gent, it can be shown that the estimator Pv in Eq. (5.115) is asymptotically unbiased. The proof

can be found at the end of this subsection.

The estimation error of Pv depends on the relationship between Y and V . It can be verified

that if V ≡ Y , then Pv is identical to the estimate for Y in Subset Simulation and so they have the


same estimation error. At the other extreme, if V is independent of Y , then the estimation error

of Pv is always worse than a Direct Monte Carlo estimator. In this case there is no incentive

for estimating the CCDF of V via Subset Simulation driven by Y . To see this, suppose Vis independent of Y and consider the case where in Eqs. (5.114) and (5.115) the indicator

functions {I(Vik > v) : i = 0,… , m − 1; k = 1,… , Mi} are all i.i.d. with mean P(V > v) and

variance P(V > v)[1 − P(V > v)]. This is the best possible situation because in reality the

samples are correlated, which will increase the variance of the estimator. Define Pik = Pi∕Miso that

∑i,k Pik = 1; in the double sum i goes from 0 to (m − 1) and k from 1 to Mi. We can

then write Pv =∑

i,k I(Vik > v)Pik, that is, a weighted sum of i.i.d. indicator function values.

Taking variance gives

var[Pv] =∑i,k

var[I(Vik > v)]P2ik = P(V > v)[1 − P(V > v)]

∑i,k

P2ik (5.116)

The double sum can be expressed as∑i,k

P2ik =

∑i,k

(Pik − P)2 + P2M (5.117)

where M =∑m−1

i=0 Mi is the total number of samples (i.e., function evaluations) of V and

P =∑

i,k Pik∕M is the average of Pik. Since∑

i,k Pik = 1, we have P = 1∕M and so

∑i,k

P2ik =

∑i,k

(Pik −

1

M

)2

+ 1

M≥ 1

M(5.118)

Substituting into Eq. (5.116), we conclude

var[Pv] ≥ 1

MP(V > v)[1 − P(V > v)] (5.119)

where the equality holds if and only if Pik = 1∕M for all i, k. The RHS is the variance of Direct

Monte Carlo estimator with M i.i.d. samples of V . This means that Pv is inferior to Direct

Monte Carlo estimator if V is independent of Y . This results from the non-uniform weights

assigned to the i.i.d. samples.

In the general case, a direct formula for the variance of Pv has not been developed for

assessing the estimation error using information in a single simulation run. It can be expected

that the higher the correlation between V and Y the smaller the estimation variance will be. It

is also inversely proportional to the sample size, which will be discounted by the correlation

among the MCMC samples.

Proof of asymtotic unbiasednessTo show that Pv is asymptotically unbiased, note that the bins {Bi} are random and so P(Bi)

is a random quantity. Let

P(Bi) = Pi + Zi (5.120)


where Pi is given by Eq. (5.112) and Zi is a random quantity with E[Zi] → 0 and var[Zi] → 0

as N → ∞. Note that Zi is determined by Bi. Substituting Eq. (5.120) into Eq. (5.115) and

taking expectation,

E[Pv] =m−1∑i=0

E[QiP(Bi)] −m−1∑i=0

E[QiZi] (5.121)

To evaluate the first sum, note that

E[QiP(Bi)] = E{E[QiP(Bi)|Bi]} = E{E[Qi|Bi]P(Bi)} = E[P(V > v|Bi)P(Bi)] (5.122)

since

E[Qi|Bi] =1

Mi

Mi∑k=1

E[I(Vik > v)|Bi] =1

Mi

Mi∑k=1

P(Vik > v|Bi) = P(V > v|Bi) (5.123)

Thus,

m−1∑i=0

E[QiP(Bi)] =m−1∑i=0

E[P(V > v|Bi)P(Bi)] = E

[m−1∑i=0

P(V > v|Bi)P(Bi)

]= P(V > v)

(5.124)

It remains to show that the second sum in Eq. (5.121) tends to zero as N → ∞. This can be

done by conditioning on Bi and then using Cauchy–Schwartz inequality:

E[QiZi]

= E{E[QiZi|Bi]}

= E{E[Qi|Bi]Zi}

= E[P(V > v|Bi)Zi]

≤ E[P(V > v|Bi)

2]1∕2

E[Z2

i

]1∕2→ 0

(5.125)

since E[P(V > v|Bi)2] ≤ 1 and E[Z2

i ] = var[Zi] + E[Zi]2 → 0.

5.6.2 Design of Driving Response

Suppose one performs a Subset Simulation run using the driving response Y . Using a single

simulation run, it is also desired to estimate the CCDF of a number of auxiliary responses,

say, {Vi : i = 1,… , nv}. Clearly, the relationship between Y and {Vi} affects the information

that can be extracted from the generated samples for estimating their CCDFs. The design

of the driving response therefore depends on how much information the user would like to

know about one auxiliary response compared to another. This “information,” however, is


difficult to quantify, at least not in a way that can be easily implemented. The problem is also

complicated by several issues. The auxiliary responses can have different scaling (units). They

can vary with different sensitivities over different scales ofX. Even if this information has been

quantified, one should also decide on the relative amount of information desired to be extracted

for different auxiliary responses. When this is not available as close-form expressions, some

simulation runs may need to be performed in order to tune the driving response. In short,

formally designing the driving response is not a trivial task, but a proper choice suffices to

give satisfactory results. See Hsu and Ching (2010) for some discussion on the design of the

driving response.

One simple form of the driving response can be constructed as follows. It intends to provide

similar information for different auxiliary responses. In order to avoid scaling problems, the

driving response can be constructed as a function of non-dimensional parameters formed by

the auxiliary responses:

Y =nv∑

i=1

ai

(Vi − vi

si

)(5.126)

where {vi} and {si} are respectively the mean and standard deviation of Vi calculated from

the samples of Vi at simulation Level 0 (Direct Monte Carlo). The non-dimensional quantity

(Vi − vi)∕si roughly ensures that Y is sensitive to Vi over its probable range of values. The

coefficients {ai} are also dimensionless. Their choice depends on the user’s emphasis on the

different variables.

5.7 Black Swan Events

Subset Simulation is based on generating samples that gradually populate towards the failure

region and yield information for estimating failure probabilities. The samples are generated

adaptively, in that the distribution of samples at higher simulation levels depends on the

samples at lower levels. This mechanism is based on the premise that the samples for frequent

events can be later developed into samples for rare events. We refer here those rare events (at

higher levels) of significant probability content but unlikely to be developed from the frequent

events (at lower levels) as “black swan events.” This is an approximate concept relative to

the configuration of a single simulation run, because rare events that cannot be generated in a

particular run may be discovered in another run. The chance of discovering them is also higher

with a larger number of samples.

By its nature, Subset Simulation gradually explores the failure events, including many which

are exotic to the user prior to simulation. The black swan events referred here are exotic not

only to the user, but also to Subset Simulation. They are not expected along the way the

samples are generated adaptively by MCMC. When this happens, it is likely that the CCDF

estimate has not accounted adequately for the characteristics of the failure mode associated

with the black swan events. The estimate can be significantly biased.

The issue with black swan events is a practical one lying between a perfectly biased and an

ergodic situation. If it is perfectly biased, then one would not be able to encounter the event

in the first place. One typical situation associated with black swan events is that when one

performs a few simulation runs there is a small number of runs that give a CCDF estimate with


very different characteristics. We first illustrate this issue with a simple example. Diagnostic

strategies are discussed later.

Example 5.7 Black swan eventLet X1 and X2 be i.i.d. standard Gaussian and the response Y be defined by

Y =

{X1 X1 ≤ a

X2 X1 > a(5.127)

Clearly, when Y is smaller than a it is associated with X1; when it is greater than a it is associated

with X2. The distribution of the samples (X1, X2) populating the failure event {Y > b} depends

very much on whether b is greater than a or not. For small values of b < a the “dominant

mode” that leads to failure is associated with X1 by X1 > b. For large values of b > a the

dominant mode is associated with X2 by X2 > b. A proper simulation run should be capable of

populating samples that capture these failure modes as the simulation level ascends. However,

this may not happen in a particular run if the samples of X2 at low simulation levels have not

populated near the region that allow them to further propagate to higher levels.

For illustration, let a = 2. We perform Subset Simulation with the following parameters as

before: p0 = 0.1, N = 500, and uniform proposal PDF with a maximum step length of w = 1.

Figure 5.13a shows the results from 10 independent runs. For reference, the exact solution is

shown with a dashed line. Elementary probability shows that the exact solution is given by

P(Y > b) =

{Φ(a) − Φ(b) + Φ(−a)Φ(−b) b ≤ a

Φ(−a)Φ(−b) b > a(5.128)

The results from the 10 runs agree quite well for b < 2. For b > 2, eight curves gather around

the exact solution. The remaining two exhibit very different characteristics in that they do not

-1 0 1 2 3 4 510

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

(a)

b

P(Y

>b

)

-1 0 1 2 3 4 510

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

(b)

b

P(Y

>b

)

Figure 5.13 Simulation results, Example 5.7. (a) Ten independent runs (solid lines). (b) Ensemble

average (solid line) and +/– standard deviation (dashed line). Exact solution – heavy dashed line.


-4 -2 0 2 4-4

-2

0

2

4Level 0

-4 -2 0 2 4-4

-2

0

2

4Level 1

-4 -2 0 2 4-4

-2

0

2

4Level 2

-4 -2 0 2 4-4

-2

0

2

4Level 3

X1

X2

-4 -2 0 2 4-4

-2

0

2

4Level 4

-4 -2 0 2 4-4

-2

0

2

4Level 5

(a) (b) (c)

(d) (e) (f)

Figure 5.14 Population of samples at different levels in a proper run, Example 5.7. (a)–(f) for Levels

0 to 5, respectively. Solid line – conditional boundary of the current level; dashed line – conditional

boundary of the next level.

have a second branch of increasing b values. They simply suggest that it is almost impossible

to have Y > 2, which is not correct.

For reference, Figure 5.13b shows the +/– one standard deviation interval of b based on

50 simulation runs, that is, ensemble statistics. The error interval increases significantly after

b > 2.

Figure 5.14 shows the population of samples for a “proper run,” where the samples are able

to propagate from a failure mode governed by X1 for small b < 2 to a mode governed by X2

for large b > 2. The critical stage is near Level 3, which relies on the samples with X2 > 2

to generate samples conditional on higher levels. If there were no samples at Level 3 with

X2 > 2, then it would not be possible to push the threshold level to a higher value, since then

all samples would have Y ≤ 2.

Figure 5.15 shows the correlation sequence estimated using the samples from the proper run.

At Level 2 and 3 the decay of the correlation sequence is especially slow, due to high rejection

of the candidates during MCMC. After Level 3 the correlation sequence decays quickly

again.

5.7.1 Diagnosis

The first issue with black swan events is whether they can be detected at all during simulation.

A high rejection rate can be one sign suggesting that the samples have difficulty propagating

to higher levels. This, of course, could be the nature of the problem; for example, Y has an


0 2 4 6 8

0

0.5

1

(a) (b) (c)

(d) (e) (f)

Level 0, γ1 = 0

0 2 4 6 8

0

0.5

1

Level 1, γ2 = 1.4

0 2 4 6 8

0

0.5

1

Level 2, γ3 = 6.9

0 2 4 6 8

0

0.5

1

ρ(k)

Lag k

Level 3, γ4 = 8.2

0 2 4 6 8

0

0.5

1

Level 4, γ5 = 2.8

0 2 4 6 8

0

0.5

1

Level 5, γ6 = 3.4

Figure 5.15 Correlation sequence in a proper run, Example 5.7. (a)–(f) for Levels 0 to 5, respectively.

upper bound, which should be examined for the problem at hand. Performing an independent

run with a larger N may help, although it cannot eliminate the problem. In this regard it is not

advised to reduce the spread of the proposal PDF without proper justification. Doing so may

prematurely screen out probable events, artificially rendering them to be black swans at higher

levels. Another, perhaps more effective, way is to examine if it is possible for the problem to

have black swans; for example, whether the response may bifurcate at some point to exhibit

different characteristics.

Once the black swan events are detected, the question is then how to properly account for

them in the simulation run. It should be noted that at this stage the events have to some extent

been accounted for by the results. The concern is whether they have been accounted for in

the right proportion, most often being underestimated. Their presence also tends to give high

estimation error compared to a proper simulation run. This is thus an efficiency issue. One

natural way is to separate the problem into different cases, perform Subset Simulation for

them separately, and then combine their results in conceptually the same manner as “stratified

sampling” (e.g., Rubinstein, 1981). Of course, this is necessary only when it is found that

Subset Simulation has difficulty generating samples that sufficiently cover the failure modes.

By MCMC, multiple chains, and propagating samples progressively, Subset Simulation has

some capability of discovering a variety of failure modes. It is not foolproof, however.

Example 5.8 Black swan event, stratified samplingIn Example 5.7, suppose we know a priori or from some trial runs that the response behaves

very differently depending on whether X1 > a. Then we may write using the theorem of total

probability:

P(Y > b) = P(Y > b|X1 ≤ a)P(X1 ≤ a) + P(Y > b|X1 > a)P(X1 > a) (5.129)


-1 0 1 2 3 4 510

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

(a) (b)

b

P(Y

>b

)

-1 0 1 2 3 4 510

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

b

P(Y

>b

)Figure 5.16 Simulation result, Example 5.7 with stratified sampling. (a) Ten independent runs (solid

lines). (b) Ensemble average (solid line) and +/– standard deviation (dashed line). Exact solution – heavy

dashed line.

The probability P(X1 ≤ a) and hence P(X1 > a) = 1 − P(X1 ≤ a) is known because X1

follows a standard distribution. The remaining conditional probabilities P(Y > b|X1 ≤ a) and

P(Y > b|X1 > a), which should be viewed as a CCDF versus b, can be determined by Subset

Simulation where the conditioning on X1 should be exercised in the generation of samples.

The results are then combined to give P(Y > b) using Eq. (5.129). In each of these simulation

runs there are (hopefully) no black swans and so the efficiency should be similar to what can

normally be achieved.

Figure 5.16 shows the results using the stratified sampling concept, where p0 = 0.1, m = 6,

and N = 250 have been used for the two sub-problems so that the total number of function

evaluations is the same as before. This figure should be compared with Figure 5.13. The samples

have no problem propagating in each of the sub-problems. The results from different runs in

Figure 5.16a are qualitatively similar. The resulting estimate has much smaller estimation

variance, as indicated in Figure 5.16b.

Although this is a “toy example” with only two random variables, the general principle and

potential gain in efficiency are similar in more complicated problems.

5.8 Applications

Subset Simulation was originally developed for seismic risk analysis of building structures

subjected to stochastic earthquake motions (Au and Beck, 2000a, 2000b; Au, 2001; Au and

Beck, 2001; Au and Beck, 2003), where the problem involved a large number (theoretically

infinite) of random variables arising primarily from the time-domain stochastic description of

ground motions. Applications to different disciplines have appeared, for example, in aerospace

engineering (Pellissetti et al., 2006; Thunnissen et al., 2007a, 2007b), fire engineering (Au

et al., 2007b), geotechnical engineering (Phoon, 2008; Wang et al., 2010; Santoso et al., 2011a),

nuclear engineering (Marseguerra, 2011; Cadini et al., 2012; Zio and Pedroni 2009, 2011,


10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

100

101

102

103

104

Uni

t c.o

.v.

Failure probability

x

Aerospace

Fire

Geotech

Nuclear

Structural

Urban canopy

Figure 5.17 Unit c.o.v. versus failure probability.

2012), structural engineering (Koutsourelakis et al., 2004; Jalayer and Beck, 2008; Augusti

and Ciampoli, 2008; Rajeev et al., 2008; Wetzel and Proppe, 2010; Smith and Banerjee, 2012)

and meteorology (Wang et al., 2011a). See Schueller et al. (2004), Schueller and Pradlewarter

(2007), and Au et al. (2007a) for performance of Subset Simulation in a set of benchmark

problems. Implementing Subset Simulation on a spreadsheet (Au et al., 2010; Wang et al.,2011b) or a parallel processing platform (Pellissetti, 2009; Pellissetti and Schueller, 2009;

Patelli et al., 2012) has also been studied.

Figure 5.17 shows the unit c.o.v. Δ versus the failure probability pF based on results

reported in the literature. Admittedly, the data set here is dominated by structural engineering

applications, primarily because that was the discipline where the method was developed and

where many subsequent developments were made. The results reported here are by no means

complete as they are based on journal papers known to the authors at the time of writing and

with sufficient details reported on the unit c.o.v. The dashed line in the figure shows the unit

c.o.v. of Direct Monte Carlo, which is theoretically given by Δ =√

(1 − pF)∕pF . Naturally,

the unit c.o.v. of the Subset Simulation estimator depends on a number of factors, such as

the failure probability, the choice of proposal PDF, the parameterization of random variables,

system complexity, and so on. The number of random variables in the problems reported in

the figure ranges from a few tens to thousands. Most problems are nonlinear. The points in

the figure scatter around a region that grows at a significantly slower pace with decreasing

failure probability. They almost coincide with Direct Monte Carlo at large probability (0.1) as

it corresponds to Level 0.

Figure 5.18 gives another perspective by showing on the y-axis the total number of samples

NT required to achieve a specified c.o.v. 𝛿 in the failure probability estimate. This is theoret-

ically given by NT = (Δ∕𝛿)2, where 𝛿 = 30% has been used in the figure. Again, the dashed

line shows the total number of samples required by Direct Monte Carlo, which is theoretically

given by NT = (1 − pF)∕(pF𝛿2).


10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

101

102

103

104

105

106

107

108

x

Aerospace

Fire

Geotech

Nuclear

Structural

Urban canopy

Failure probability

No.

of s

ampl

es fo

r 30

% c

.o.v

Figure 5.18 Computational effort versus failure probability.

5.9 Variants

As the basic idea behind Subset Simulation is quite fundamental, there are many ways it can

be used to develop algorithms with higher efficiency or to solve other related problems. In this

subsection we outline a few variants of Subset Simulation that have appeared in the literature.

Again, this is by no means complete as this body of knowledge is still growing.

Efficiency gains can often be achieved beyond the standard Subset Simulation algorithm

by a better choice of proposal distributions, incorporating prior information, or exploiting the

special structure of the problem, although successes are of a varying degree depending on the

problem. One natural objective is to reduce the correlation between the conditional samples

along the Markov chain, that is, reducing the factor 𝛾j in Eq. (5.20). This has been achieved by

modifying the rejection mechanism in MCMC in Zuev and Katafygiotis (2011) and Santoso

et al. (2011b). For some problems where there is a causal relationship between the response

and some of the random variables, Ching et al. (2005a, 2005b) modified the MCMC algorithm

that increases the acceptance rate of the candidate sample without systemically increasing

its correlation with the current sample. This is along the spirit of “splitting,” which applies

specifically to Markovian systems and for which there is an established body of literature

(Kahn and Harris, 1951; Villen-Altamirano and Villen-Altamiran, 1991; Glasserman et al.,1999).

Subset Simulation has also been combined with computational learning tools such as the

Support Vector Machine (Bourinet et al., 2011) and neural networks Papadopoulos et al.(2012). The information from meta-models or surrogate response, when available, can also

be incorporated (Au, 2007; Mathews et al., 2011). Strategies have been proposed to exploit

the characteristics of the class of systems under study and gain efficiency by using a more

efficient method tailored to the part of the problem that is analytically tractable (Katafygiotis

and Cheung 2005; Yuan et al., 2010).


Subset Simulation has been incorporated into reliability-based optimization algorithms

(Jensen, 2005; Ching and Hsieh, 2007a; Jensen and Beer, 2010; Valdebenito and Schueller,

2010; Dubourg et al., 2011; Yang and Hsieh, 2011). It has also been extended to solving

reliability “design sensitivity” problems, where the target is to obtain the failure probability

for different choices of design parameters. From first principle this problem requires repeated

reliability analyses, for example, repeated runs of Subset Simulation. By artificially consider-

ing the design parameters as random in the problem, it is possible to extract design sensitivity

information based on a single simulation run (Au, 2005). Algorithms for design sensitivity or

reliability-based optimization have been developed based on this idea, for example, Ching and

Hsieh (2007b), Song et al. (2009), Taflanidis and Beck (2008a, 2008b, 2009). By establishing

an analogy between rare event and optimal event, Subset Simulation has been adapted for solv-

ing constrained optimization problems with potentially a large number of design parameters

and constraints (Li and Au, 2010; Wang et al., 2011c, 2011d).

References

Au, S.K. (2001) On the Solution of First Excursion Problems by Simulation with Applications to Probabilistic

Seismic Performance Assessment. PhD Thesis in Civil Engineering, Division of Engineering and Applied Science,

California Institute of Technology, California, USA.

Au, S.K. (2005) Reliability-based design sensitivity by efficient simulation. Computers and Structures, 83(14),

1048–1061.

Au, S.K. (2007) Augmenting approximate solutions for consistent reliability analysis. Probabilistic EngineeringMechanics, 22(1), 77–87.

Au, S.K. and Beck, J.L. (2000a) Subset simulation: a new approach to calculating small failure probabilities. Pro-

ceedings of International Conference on Monte Carlo Simulation, 18–21 June 2000, Monte Carlo, Monaco.

Au, S.K. and Beck, J.L. (2000b) Calculation of first passage probabilities by subset simulation. Proceedings of

Joint Specialty Conference on Probabilistic Mechanics and Structural Reliability, 24–26 July 2000, Notre Dame,

Indiana, USA.

Au, S.K. and Beck, J.L. (2001) Estimation of small failure probabilities in high dimensions by subset simulation.

Probabilistic Engineering Mechanics, 16(4), 263–277.

Au, S.K. and Beck, J.L. (2003) Subset simulation and its applications to seismic risk based on dynamic analysis.

Journal of Engineering Mechanics, 129(8), 901–917.

Au, S.K., Cao, Z.J., and Wang, Y. (2010) Implementing advanced Monte Carlo under spreadsheet environment.

Structural Safety, 32(5), 281–292

Au, S.K., Ching, J., and Beck, J.L. (2007a) Application of subset simulation methods to reliability benchmark

problems. Structural Safety, 29(3), 183–193.

Au, S.K., Wang, Z.H., and Lo, S.M. (2007b) Compartment fire risk analysis by advanced Monte Carlo method.

Engineering Structures, 29(9), 2381–2390.

Augusti, G. and Ciampoli, M. (2008) Performance-based design in risk assessment and reduction. ProbabilisticEngineering Mechanics, 23, 496–508.

Bourinet, J.M., Deheeger, F., and Lemaire, M. (2011) Assessing small failure probabilities by combined subset

simulation and Support Vector Machines. Structural Safety, 33, 343–353.

Cadini, F., Avram, D., Pedroni, N., and Zio, E. (2012) Subset Simulation of a reliability model for radioactive waste

repository performance assessment. Reliability Engineering and System Safety, 100, 75–83.

Ching, J., Au, S.K., and Beck, J.L. (2005a) Reliability estimation of dynamical systems subject to stochastic excitation

using Subset Simulation with Splitting. Computer Methods in Applied Mechanics and Engineering, 194(12–16),

1557–1579.

Ching, J., Beck, J.L., and Au, S.K. (2005b) Hybrid Subset Simulation Method for reliability estimation of dynamical

systems subjected to stochastic excitation. Probabilistic Engineering Mechanics, 20(3), 199–214.

Ching, J. and Hsieh, Y.H. (2007a) Approximate reliability-based optimization using a three-step approach based on

subset simulation. Journal of Engineering Mechanics, 133(4), 481–493.


Ching, J. and Hsieh, Y.H. (2007b) Local estimation of failure probability function and its confidence interval with

maximum entropy principle. Probabilistic Engineering Mechanics, 22(1), 39–49.

Corless, R.M., Gonnet, G.H., Hare, D.E.G., et al. (1996) On the lambert W function. Advances in ComputationalMathematics, 5, 329–359.

Dubourg, V., Sudret, B., and Bourinet, J.M. (2011) Reliability-based design optimization using kriging surrogates

and subset simulation. Structural and Multidisciplinary Optimization, 44, 673–690.

Glasserman, P., Heidelberger, P., Shahabuddin, P., and Zajic, T. (1999) Multilevel splitting for estimating rare event

probabilities. Operations Research, 47(4), 585–600.

Hsu, W.C. and Ching, J. (2010) Evaluating small failure probabilities of multiple limit states by parallel subset

simulation. Probabilistic Engineering Mechanics, 25(3), 291–304.

Li, H.S. and Au, S.K. (2010) Design optimization using Subset Simulation algorithm. Structural Safety, 32(6),

384–392.

Jalayer, F. and Beck, J. L. (2008) Effects of two alternative representations of ground-motion uncertainty on proba-

bilistic seismic demand assessment of structures. Earthquake Engineering and Structural Dynamics, 37, 61–79.

Jensen, H.A. (2005) Structural optimization of linear dynamical systems under stochastic excitation: A moving

reliability database approach. Computer Methods in Applied Mechanics and Engineering, 194(12–16), 1757–

1778.

Jensen, H.A. and Beer, M. (2010) Discrete–continuous variable structural optimization of systems under stochastic

loading. Structural Safety, 32, 293–304.

Kahn, H. and Harris, T.E. (1951) Estimation of particle transmission by random sampling, National Bureau ofStandards. Applied Mathematics Series, 12, 27–30.

Katafygiotis, L.S. and Cheung, S.H. (2005) A two-stage Subset Simulation-based approach for calculating the

reliability of inelastic structural systems subjected to Gaussian random excitations. Computer Methods in AppliedMechanics and Engineering, 194, 1581–1595.

Koutsourelakis, P.S., Pradlwarter, H.J., and Schueller, G.I. (2004) Reliability of structures in high dimensions, part I:

algorithms and applications. Probabilistic Engineering Mechanics, 19, 409–417.

Marseguerra, M. (2011) An efficient Monte Carlo-SubSampling approach to first passage problems. Annals of NuclearEngineering, 38, 410–417.

Mathews, T.S., Arul, A.J., Parthasarathy, U., et al. (2011) Passive system reliability analysis using response con-

ditioning method with an application to failure frequency estimation of decay heat removal of PFBR. NuclearEngineering and Design, 241, 2257–2270.

Papadopoulos, V., Giovanis, D.G., Lagaros, N.D., and Papadrakakis, M. (2012) Accelerated subset simulation with

neural networks for reliability analysis. Computer Methods in Applied Mechanics and Engineering, 223–224,

70–80.

Patelli, E., Panayirci, H.M., Broggi, M., et al. (2012) General purpose software for efficient uncertainty management

of large finite element models. Finite Elements in Analysis and Design, 51, 31–48.

Pellissetti, M.F. (2009) Parallel processing in structural reliability. Structural Engineering and Mechanics, 32(1),

95–126.

Pellissetti, M.F. and Schueller, G.I. (2009) Scalable uncertainty and reliability analysis by integration of advanced

Monte Carlo simulation and generic finite element solvers. Computers and Structures, 87, 930–947.

Pellissetti, M.F., Schueller, G.I., Pradlwarter, H.J., et al. (2006) Reliability analysis of spacecraft structures under

static and dynamic loading. Computers and Structures, 84, 1313–1325.

Phoon, K.K. (ed.) (2008) Reliability-Based Design in Geotechnical Engineering: Computations and Applications,

Taylor & Francis, Singapore.

Rajeev, P., Franchin, P., and Pinto, P.E. (2008) Increased accuracy of vector-IM-based seismic risk assessment. Journalof Earthquake Engineering, 12(S1), 111–124.

Rubinstein, R.Y. (1981) Simulation and the Monte Carlo Method, John Wiley, New York.

Santoso, A.M., Phoon, K.K., and Quek, S.T. (2011a) Effects of soil spatial variability on rainfall-induced landslides.

Computers and Structures, 89, 893–900.

Santoso, A.M., Phoon, K.K., and Quek, S.T. (2011b) Modified Metropolis–Hastings algorithm with reduced chain

correlation for efficient subset simulation. Probabilistic Engineering Mechanics, 26, 331–341.

Schueller, G.I. and Pradlwarter, H.J. (2007) Benchmark study on reliability estimation in higher dimensions of

structural systems – an overview. Structural Safety, 29, 167–182.

Schueller, G.I., Pradlwarter, H.J., and Koutsourelakis, P.S. (2004) A critical appraisal of reliability estimation proce-

dures for high dimensions. Probabilistic Engineering Mechanics, 19, 463–474.


Smith, B. and Banerjee, B. (2012) Reliability of inserts in sandwich composite panels. Composite Structures, 94,

820–829.

Song, S.F., Lu, Z.Z., and Qiao, H.W. (2009) Subset simulation for structural reliability sensitivity analysis. ReliabilityEngineering and System Safety, 94, 658–665.

Taflanidis, A.A. and Beck, J.L. (2008a) An efficient framework for optimal robust stochastic system design using

stochastic simulation. Computer Methods in Applied Mechanics and Engineering, 198(1), 88–101.

Taflanidis, A.A. and Beck, J.L. (2008b) Stochastic subset optimization for optimal reliability problems. ProbabilisticEngineering Mechanics, 23, 324–338.

Taflanidis, A.A. and Beck, J.L. (2009) Stochastic subset optimization for reliability optimization and sensitivity

analysis in system design. Computers and Structures, 87(5–6), 318–331.

Thunnissen, D.P., Au, S.K., and Swenka, E.R. (2007a) Uncertainty quantification in the preliminary design of a

spacecraft attitude control system. AIAA Journal of Aerospace Computing, Information, and Communication, 4,

902–917.

Thunnissen, D.P., Au, S.K., and Tsuyuki, G.T. (2007b) Uncertainty quantification in estimating critical spacecraft

component temperatures. AIAA Journal of Thermophysics and Heat Transfer, 21(2), 422–430.

Valdebenito, M.A. and Schueller, G.I. (2010) Reliability-based optimization considering design variables of discrete

size. Engineering Structures, 32, 2919–2930.

Villen-Altamirano, M. and Villen-Altamirano, J. (1991) RESTART: a method for accelerating rare event simulations,

in Queueing, Performance and Control in ATM (eds J.W. Cohen and C.D. Pack), Elsevier Science Publishers,

Amsterdam, pp. 71–76.

Wang, Z.H., Bou-Zeid, E., Au, S.K., and Smith, J.A. (2011a) Analyzing the sensitivity of WRF”s single-layer urban

canopy model to parameter uncertainty using advanced Monte Carlo simulation. Journal of Applied Meteorologyand Climatology, 50(9), 1795–1814.

Wang, Y., Cao, Z., and Au, S.K. (2010) Efficient Monte Carlo Simulation of parameter sensitivity in probabilistic

slope stability analysis. Computers and Geotechnics, 37, 1015–1022.

Wang, Y., Cao, Z.J., and Au, S.K. (2011b) Practical reliability analysis of slope stability by advanced Monte Carlo

simulations in a spreadsheet. Canadian Geotechnical Journal, 48, 162–172.

Wang, Q., Lu, Z.Z., and Tang, Z.C. (2011c) A novel global optimization method of truss topology. TechnologicalSciences, 54(10), 2723–2729.

Wang, Q., Lu, Z.Z., and Zhou, C. (2011d) New topology optimization method for wing leading-edge ribs. Journal ofAircraft, 48(5), 1741–1748.

Wetzel, C. and Proppe, C. (2010) Stochastic modeling in multibody dynamics: aerodynamic loads on ground vehicles.

Journal of Computational and Nonlinear Dynamics, ASME, 5, 031009.

Yang, I.T. and Hsieh, Y.H. (2011) Reliability-based design optimization with discrete design variables and non-smooth

performance functions: AB-PSO algorithm. Automation in Construction, 20, 610–619.

Yuan, X.K., Lu, Z.Z., and Qiao, H.W. (2010) Conditional probability Markov chain simulation based reliability

analysis method for nonnormal variables. Technological Sciences, 53(5), 1434–1441.

Zio, E. and Pedroni, N. (2009) Estimation of the functional failure probability of a thermal–hydraulic passive system

by Subset Simulation. Nuclear Engineering and Design, 239, 580–599.

Zio, E. and Pedroni, N. (2011) How to effectively compute the reliability of a thermal–hydraulic nuclear passive

system. Nuclear Engineering and Design, 241, 310–327.

Zio, E. and Pedroni, N. (2012) Monte Carlo simulation-based sensitivity analysis of the model of a thermal–hydraulic

passive system. Reliability Engineering and System Safety, 107, 90–106.

Zuev, K.M. and Katafygiotis, L.S. (2011) Modified Metropolis–Hastings algorithm with delayed rejection. Proba-bilistic Engineering Mechanics, 26, 405–412.

Documents

Engineering Risk Assessment with Subset Simulation (Au/Engineering) || Subset Simulation