Upload
garron71
View
224
Download
0
Embed Size (px)
Citation preview
8/3/2019 Univariate Input Models for Stochastic Simulation
1/17
Univariate input models for stochastic simulationME Kuhl1, JS Ivy2, EK Lada3, NM Steiger4, MA Wagner5 and JR Wilson2*1
Rochester Institute of Technology, Rochester, NY, USA;
2
North Carolina State University, Raleigh, NC, USA;3SAS Institute Inc., Cary, NC, USA; 4University of Maine, Orono, ME, USA; 5SAIC, Vienna, VA, USA
Techniques are presented for modelling and then randomly sampling many of the continuous univariate probabilistic
input processes that drive discrete-event simulation experiments. Emphasis is given to the generalized beta distribution
family, the Johnson translation system of distributions, and the Be zier distribution family because of the flexibility of
these families to model a wide range of distributional shapes that arise in practical applications. Methods are described
for rapidly fitting these distributions to data or to subjective information (expert opinion) and for randomly sampling
from the fitted distributions. Also discussed are applications ranging from pharmaceutical manufacturing and medical
decision analysis to smart-materials research and health-care systems analysis.
Journal of Simulation (2010) 4, 8197. doi:10.1057/jos.2009.31; published online 26 February 2010
Keywords: simulation; continuous univariate input models; generalized beta distributions; Johnson translation
system of distributions; Be zier distributions
1. Introduction
One of the main problems in the design and construction of
stochastic simulation experiments is the selection of valid
input modelsthat is, probability distributions that accu-
rately mimic the behaviour of the random input processes
driving the system under study. Often the following
interrelated difficulties arise in attempts to use standard
distribution families for simulation input modelling:
1. Standard distribution families cannot adequately repre-sent the probabilistic behaviour of many real-world
input processes, especially in the tails of the underlying
distribution.
2. The parameters of the selected distribution family are
troublesome to estimate from either sample data or
subjective information (expert opinion).
3. Fine-tuning or editing the shape of the fitted distribution
is difficult because (i) there are a limited number of
parameters available to control the shape of the fitted
distribution, and (ii) there is no effective mechanism for
directly manipulating the shape of the fitted distribution
while simultaneously updating the corresponding para-meter estimates.
In modelling a simulation input process, the practitioner
must identify an appropriate distribution family and then
estimate the corresponding distribution parameters; and the
problems enumerated above can hinder the progress of both
of these model-building activities.
The conventional approach to identification of a stochas-
tic simulation input model encompasses several procedures
for using sample data to accept, reject, or somehow rank
each of the distribution families in a list of well-known
alternatives. These procedures include (i) informal graphical
techniques based on probability plots, frequency distribu-
tions, or box plots; and (ii) statistical goodness-of-fit tests
such as the KolmogorovSmirnov, chi-squared, Anderson
Darling, and Crame rvon Mises tests. For a detaileddiscussion of these procedures, see Sections 6.36.6 of Law
(2007) and Stephens (1974). Unfortunately, none of these
procedures is guaranteed to yield a definitive conclusion. For
example, identification of an input distribution can be based
on visual comparison of superimposed graphs of a
histogram of the available data set and the fitted probability
density function (p.d.f.) for each of several alternative
distribution families. In this situation, however, the final
conclusion depends largely on the number of class intervals
(also called bins or cells) in the histogram as well as the
class boundaries; and a different layout for the histo-
gram could lead the user to identify a different distributionfamily. Similar anomalies can occur in the use of statis-
tical goodness-of-fit tests. In small samples, these tests can
have very low power to detect lack of fit between the
empirical distribution and each alternative theoretical
distribution, resulting in an inability to reject any of the
alternative distributions. In large samples, moreover, practi-
cally insignificant discrepancies between the empirical
and theoretical distributions often appear to be statis-
tically significant, resulting in rejection of all the alternative
distributions.
*Correspondence: JR Wilson, Edward P. Fitts Department of Industrialand Systems Engineering, North Carolina State University, 111 LampeDrive, Daniels Hall, Room 370, Campus Box 7906, Raleigh, NorthCarolina 27695-7906, USA.E-mail: [email protected]
Journal of Simulation (2010) 4, 8197 r 2010 Operational Research Society Ltd. All rights reserved. 1747-7778/10
www.palgrave-journals.com/jos/
8/3/2019 Univariate Input Models for Stochastic Simulation
2/17
After somehow identifying an appropriate family of
distributions to model an input process, the simulation user
also faces problems in estimating the associated distribution
parameters. The user often attempts to match the mean
and standard deviation of the fitted distribution with the
sample mean and standard deviation of a data set, but shape
characteristics such as the sample skewness and kurtosis are
less frequently considered when estimating the parameters
of an input distribution. Some estimation methods, such as
maximum likelihood and percentile matching, may simply
fail to yield parameter estimates for some distribution
families. Even if several distribution families are readily
fitted to a set of sample data, the user generally lacks a
definitive basis for selecting the appropriate best-fitting
distributionin particular, several commercial input-model-
ling packages base their model-selection procedure on an
unspecified combination of some of the goodness-of-fit test
statistics mentioned above, and the details of the model-
selection procedure are actually concealed from the user on
the grounds that such information is proprietary. A notableexception to this is the automatic distribution-fitting
procedure of JMP 8 (SAS Institute Inc., 2008), which makes
transparent use of the Akaike information criterion (Akaike,
1974) as the basis for selecting the distribution that yields the
best fit to a given data set.
The task of building a simulation input model is further
complicated if sample data are not available. In this
situation, identification of an appropriate distribution family
is arbitrarily based on whatever information can be elicited
from knowledgeable individuals (experts); and the corre-
sponding distribution parameters are computed from sub-
jective estimates of simple numerical characteristics of the
underlying distribution such as the mode, selected percen-
tiles, or low-order moments. In summary, there is some
evidence that many simulation practitioners lack a clear-cut,
definitive procedure for identifying and estimating high-
fidelity stochastic input models (or even merely acceptable,
rough-cut input models); consequently, simulation output
analysis is often based on input processes of questionable
validity. The latter observation, coupled with the current
capabilities and limitations of typical off-the-shelf simulation
input-modelling software, has led to the research that is
surveyed in this article for handling some of the difficulties
outlined above.
This invited article is an expanded version of a series ofintroductory tutorials on simulation input modelling, which
we have been asked to present at the Winter Simulation
Conference for the past several years (Kuhl et al, 2006,
2008a,b). In this article techniques are presented for
modelling and then randomly sampling many of the
continuous univariate probabilistic input processes that
drive discrete-event simulation experiments, with the pri-
mary focus on methods designed to alleviate the difficulties
encountered in using conventional approaches to simulation
input modelling. Emphasis is given to the generalized beta
distribution family (Section 2), the Johnson translation
system of distributions (Section 3), and the Be zier distribu-
tion family (Section 4) because in our experience these
families can be most readily and effectively used in a broad
diversity of simulation applicationsespecially in large-scale
applications for which reasonably accurate input models
must be delivered under severe time pressure, and the user
may not have immediate access to detailed knowledge of the
physics of all the input processes so that empirical input
models must be formulated and fitted quickly using readily
available sample data or subjective information. For each
distribution family, we describe methods for fitting distri-
butions to sample data or expert opinion and then for
randomly sampling the fitted distributions. Much of the
discussion concerns public-domain software and fitting
procedures that facilitate rapid univariate simulation input
modelling. To illustrate these procedures, we also discuss
applications ranging from pharmaceutical manufacturing
and medical decision analysis to smart-materials research
and health-care systems analysis. Finally in Section 5conclusions and recommendations are presented, including
a brief discussion of other discrete and continuous distri-
bution families, which can be used for simulation input
modelling. In a companion article (Kuhl et al, 2010), we
discuss some multivariate distributions that frequently arise
in probabilistic simulation input modelling; see also Sections
34 of Kuhl et al(2006).
2. Generalized beta distribution family
Suppose X is a continuous random variable with lower limit
a and upper limit b whose distribution is to be approximatedand then randomly sampled in a simulation experiment. In
such a situation, it is often possible to model the proba-
bilistic behaviour of X using a generalized beta distribution,
whose p.d.f. has the form
fXx Ga1 a2x a
a11b xa21
Ga1Ga2b aa1 a21
for apxpb
1
where G(z) R1
0 tz1etdt (for z40) denotes the gamma
function. For graphs illustrating the wide range of distribu-
tional shapes achievable with generalized beta distributions,
see one of the following references: pp 9293 of Hahn and
Shapiro (1967); pp 291293 of Law (2007); or pp 1114 of
Kuhl et al (2008b), which is available online.
If X has the p.d.f. (1), then the cumulative distribution
function (c.d.f.) ofX, which is defined by FXx PrfXpxgRx
1 fXwdw for all real x, unfortunately has no con-venient analytical expression; but the mean and variance of
X are respectively given by
mX EX a1b a2a
a1 a22
82 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
3/17
and
s2X EX mX2
b a2a1a2
a1 a22a1 a2 1
3
Recall that for a continuous p.d.f. fX( ), a mode m is a localmaximum of that function; and if there is a unique global
maximum for fX( ), then the p.d.f. is said to be unimodal,and m is usually called the most likely value of the random
variable X. Ifa1,a2X1 and either a141 or a241, then the
beta p.d.f. (1) is unimodal; and the mode is given by
m a1 1b a2 1a
a1 a2 2a1; a2X1 and a1a241 4
Equations (2)(4) reveal that key distributional character-
istics of the generalized beta distribution are simple functions
of the parameters a, b, a1, and a2; and this facilitates input
modelling, especially in pilot studies in which rapid model
development is critical.
2.1. Fitting beta distributions to data or subjective
information
Given a random sample {Xi: i 1,y, n} of size n from thedistribution to be estimated, let X(1)pX(2)p?pX(n) denote
the order statistics obtained by sorting the {Xi} in ascen-
ding order so that X(1) min{Xi: i 1,y, n} and X(n) max{Xi: i 1,y, n}. We can fit a generalized beta distribu-tion to this data set using the following sample statistics:
ba 2X1 X2;
bb 2Xn Xn1
X 1n P
n
i1X
i; S2 1
n1 Pn
i1X
i X2
9=; 5
In particular the method of moment matching involves (i)
setting the right-hand sides of (2) and (3) equal to the sample
mean X and the sample variance S2, respectively; and (ii)
solving the resulting equations for the corresponding
estimates ba1 and ba2 of the shape parameters. In terms ofthe auxiliary quantities
d1 Xbabb ba and d2 Sbb ba
the moment-matching estimates of
ba1 and
ba2 are given by
a1 d21 1 d1
d22 d1; ba2 d11 d12
d22 1 d1 6
AbouRizk et al (1994) discuss BetaFit, a Windows-based
software package for fitting the generalized beta distribution
to sample data by computing estimators ba, bb, ba1, and ba2using the following estimation methods:
moment matching with ba X(1) and bb X(n); feasibility-constrained moment matching, so that the fea-
sibility conditions
baoX(1) and X(n)o
bb are always satisfied;
maximum likelihood (assuming a and b are known andthus are not estimated); and
ordinary least squares (OLS) and diagonally weightedleast squares (DWLS) estimation of the c.d.f.
Figure 1 demonstrates the application of BetaFit to a
sample of 9980 observations of end-to-end chain lengths
(in angstro ms) of the ionic polymer Nafion based on themethod of moment matching. In Section 3.5 below, we
provide further details on the origin of the Nafion data set
and its relevance to the problem of predicting the stiffness
properties of a certain class of smart materials. Like all
the software packages mentioned in this article, BetaFit is in
the public domain and is available on the Web site via
www.ise.ncsu.edu/jwilson/page3.
For rapid development of preliminary simulation models,
practitioners often base an initial input model for the
random variable X on subjective estimates
ba,
bm, and bb of
the minimum, mode, and maximum, respectively, of the
distribution of X. Although the triangular distribution is
often used in such circumstances, it can yield excessively
heavy tailsand hence grossly unrealistic simulation re-
sultswhen the distance bbbm between the estimates of theupper limit and mode is much larger than the distance bmbabetween the estimates of the mode and lower limit, or vice
versa. The generalized beta distribution is usually a better
choice in such situations; but there is some difficulty in
selecting the shape parameters to yield the desired value bmfor the mode. For an elaboration of this point in the context
of project-management simulations, see Vanhoucke (2010).
In many project-management and quality-control applica-
tions, it is convenient to assume that the standard deviation
of the random variable at hand is one-sixth of thecorresponding range; and if we equate the right-hand sides
of (3) and (4), respectively, with the subjective estimates
(bbba )2/36 and bm of the variance and mode of X, then wemust solve a cubic equation to obtain the corresponding
shape parameters of the beta p.d.f. (1). In terms of the
auxiliary quantity
q bm babb ba
we see that in the special cases in which q 0 or q 1, therequired shape parameters are exactly given by
ba1 1 and ba2 3:87227 ifq 0ba1 3:87227 and ba2 1 ifq 1' 7(For a detailed justification of (7), see the Appendix of this
article, which contains exact computing formulas for the
shape parameters of a beta distribution with user-specified
values of the end-points, mode, and variance.)
For the more common case in which 0oqo1, remarkably
accurate, simple approximations to the shape parameters of
the beta distribution with minimum ba, mode bm, maximum
bb, and standard deviation (
bb
ba )/6 can be conveniently
ME Kuhl et alUnivariate input models for stochastic simulation 83
8/3/2019 Univariate Input Models for Stochastic Simulation
4/17
calculated from the asymmetry ratio
r bb bmbm ba 1 qq
so that the required shape parameters are given by
ba1 r2 3r 4r2 1
and ba2 4r2 3r 1r2 1
8
see pp 202203 of Wilson et al (1982) and McBride and
McClelland (1967). If 0.02pqp0.98, then the error in the
approximation (8) is less than 3%; and if 0.1pqp0.9, then
the error in this approximation is less than 1.2%. To handle
situations in which the estimated mode bm is very close to oneof the estimated end-points ba and bb (that is, qo0.02 orq40.98), see the Appendix. In the application of beta
distributions to a problem in medical decision making that is
detailed in Section 2.4 below, the error in using the
approximation (8) was essentially zero (that is, less than
108) on each of 50 different beta distributions used in the
associated simulation study.
AbouRizk et al(1991) discuss the Visual Interactive Beta
Estimation System (VIBES), a Windows-based software
package that enables graphically oriented fitting of general-ized beta distributions to subjective estimates of: (i) the end-
points a and b; and (ii) any of the following combinations of
distributional characteristics:
the mean mX and the variance sX2 ,
the mean mX and the mode m, the mode m and the variance sX
2 ,
the mode m and an arbitrary quantile xp FX1(p)
for pA(0, 1), or
two quantiles xp and xq for p, qA(0, 1).
Figure 1 Beta p.d.f. (top panel) and c.d.f. (bottom panel) fitted to 9980 Nafion chain lengths.
84 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
5/17
As a general-purpose tool for simulation input modelling,
the generalized beta distribution family has the following
advantages:
It is sufficiently flexible to represent with reasonableaccuracy a wide diversity of distributional shapes.
Its parameters are easily estimated from either sampledata or subjective information.
On the other hand, generating samples from the beta
distribution is relatively slow; and in some applications,
the time to generate beta random variables can be a
substantial fraction of the overall simulation run time
(Wilson et al, 1982).
2.2. Generating beta variates
Although most general-purpose simulation packages pro-
vide a generator of beta random variables, in our experience
some care is required to verify the performance of a betavariate generator in cases where any shape parameter is less
than one or is very large (say, greater than 30). Note that
Equations (7)(8) always yield 1pa1, a2p4 while Equations
(A1)(A5) in the Appendix always yield a1, a2X1; and in
these situations, we have obtained excellent results using two
procedures available in Press et al (2007). To generate a
generalized beta random variable X with minimum a,
maximum b, and shape parameters a1 and a2, the first
method uses Gammadev of Press et al (2007) to generate
Y(a1, a2), a standard beta random variable on the unit
interval [0,1] with shape parameters a1 and a2; and then the
desired random sample is given by
X a b aYa1; a2 9
In terms of the incomplete beta function
Ixa1; a2 Ga1 a2
Ga1Ga2
Zx0
ta111 ta21dt
for 0pxp1
10
(which coincides with the c.d.f. FY(a1, a2)(x) Pr{Y(a1,a2)px}of a standard beta random variable Y(a1,a2) for 0pxp1),
the second method for generating X is based on inversion of
the c.d.f. of X,
X F1X U a b aF1Ya1; a2
U
a b aI1U a1; a211
where UBUniform [0, 1] is a random number and we use the
procedure invbetai of Press et al (2007) to obtain a highly
accurate approximation to Ix1(a1, a2) for all x in [0, 1].
Remark 1. In the companion paper on multivariate input
modelling (Kuhl et al, 2010), Ix1(a1, a2), and the associated
approximation invbetai of Press et al(2007) are important
tools in our approach to building multivariate beta distri-
butions as well as stationary univariate time series whose
marginals are generalized beta distributions.
2.3. Application of beta distributions to pharmaceutical
manufacturing
Pearlswig (1995) provides a good example of a pharmaceu-
tical manufacturing simulation whose credibility depended
critically on the use of appropriate input models. In this
study of the estimated production capacity of a plant that
had been designed but not yet built, the usual three-time
estimates (ba, bm, and bb ) were obtained from the processengineer for each of the operations in manufacturing
a certain type of effervescent tablet. Unfortunately very
conservative (ie, large) estimates were provided for the upper
limit
bb of each operation time; and when triangular
distributions were used to represent batch-to-batch variation
in actual processing times for each operation within each
step of production, the resulting bottlenecks resulted in very
low estimates of the probability of reaching a prespecified
annual production level.
As in many simulation applications in which subjective
estimates ba, bm, and bb are elicited from experts, the estimatebm of the modal (most likely) time to perform a givenoperation was substantially more reliable than the estimatesba and bb of the lower and upper limits on the same operationtime. When all the triangular distributions in the simulation
were replaced by generalized beta distributions using (8) to
ensure conformance to the engineers estimate of the most
likely processing time for each operation within each step,the resulting annual tablet production was in excellent
agreement with the production of similar plants already in
existence. This simple remedy restored the faith of manage-
ment in the validity of the overall simulation model, which
was subsequently used to finalize certain aspects of the
design and operation of the new plant.
2.4. Application of beta distributions to medical decision
analysis
In the following application of simulation input modelling
to medical decision analysis, we compare two alternativemethods for estimating the parameters of a generalized beta
distribution from limited sample data or subjective informa-
tion about the minimum, mode, and maximum values of the
target random variable. The discussion is also intended to
illustrate the extent to which simulation-generated outputs
may depend on the end-points of the fitted beta distributions
used in the simulation. This example provides insight into
the issues surrounding the use of the generalized beta
distribution to represent a simulation input that is subject to
randomness or uncertainty when that distribution must be
ME Kuhl et alUnivariate input models for stochastic simulation 85
8/3/2019 Univariate Input Models for Stochastic Simulation
6/17
fitted to subjective information or some combination of
limited sample data and subjective information.
Cost-effectiveness studies are frequently used in medical
decision making for comparing various treatment or
intervention alternatives. The Panel on Cost-Effectiveness
in Health and Medicine (Gold et al, 1996) defines cost-
effectiveness analysis (CEA) as y a method designed to
assess the comparative impacts of expenditures on different
health interventionsy that y involves estimating the net,
or incremental, costs and effects of an interventionits costs
and health outcomes compared with some alternative.
Decision models for CEA involve a large number of input
parameters, each subject to substantial uncertainty. In
particular, these studies involve uncertainty and random
variability with respect to the following quantities:
(a) Probability of occurrence for each health-related out-
come of interest;
(b) Utilitythat is, a number between 0 (death) and 1
(perfect health) that is assigned to each state of health oroutcome relevant to item (a); and
(c) Cost in constant dollars for each disease state and
intervention.
There is variability between patients and parameter un-
certainty, each reflected in the standard errors associated
with simulation-based estimates of mean performancefor
example, the expected values of the costs, quality-adjusted
life years, and utilities resulting from alternative treatments.
Therefore an accurate assessment of cost effectiveness must
involve sensitivity analysis and must attempt to model
the inherent variability and uncertainty in these parameter
estimates. Probabilistic sensitivity analysis is one method for
performing a multiway sensitivity analysis in which all
parameters subject to uncertainty are varied simultaneously
by Monte Carlo sampling from the distributions postulated
for those parameters.
Xu et al (2010) develop a decision-tree model for
determining the cost effectiveness of cesarean delivery upon
maternal request (CDMR) for women having a single
childbirth without indications. Their model compares
CDMR with trial of labour (TOL) considering all possible
short- and long-term outcomes and the resulting conse-
quences for the mother and neonate. The model takes theform of a decision tree containing over 100 chance events.
For each parameter in their decision model, Xu et al use
either literature-based or expert opinionbased estimates for
the mode, minimum, and maximum values. Typically there
is limited information available for parameter distribution
estimation; moreover, there is significant variability in the
parameter values because of substantial uncertainty regard-
ing mode of delivery with respect to utility measures, the
probabilities of outcomes, and outcome costs. Here we
explore two examples from Xu et al in which we fit beta
distributions for utility and probability parameter estimates
by two different approaches:
Using the approximation based on Equations (7) and (8);and
Using the version of the so-called Beta PERT distribu-tion that is implemented in the @RISK software (Palisade
Corporation, 2009), which is usually termed the RiskPertdistribution and is detailed in Equations (12) and (13)
below.
To illustrate each approach, we discuss in some detail how
we formulated probabilistic input models of the following
quantities:
(i) P(Vag), the probability of a vaginal delivery given that
the decision maker pursues a trial of labour; and
(ii) U(SpVag), the utility associated with a spontaneous
vaginal delivery given that the decision maker pursues a
trial of labour.
A trial of labour is a decision to attempt a vaginal
delivery; this will result in a vaginal delivery or an emergency
cesarean section. Given a vaginal delivery, there are two
possible outcomes: a spontaneous vaginal delivery or an
instrumental vaginal delivery. For the probability of a
vaginal delivery P(Vag), the most likely value of 0.9
was obtained from the published literature. Not only was
0.9 the most frequently cited value, it was also judged
to be the highest-quality estimate in terms of sample size
and its applicability to populations cited in the literature.
The values 0.844 and 0.97 were taken to be the lower and
upper bounds on P(Vag), respectively, because they
corresponded to the smallest and largest estimates found in
the literature. The associated estimates of the utility
U(SpVag) resulting from a spontaneous vaginal delivery
were obtained similarly; and the mode, minimum, and
maximum values found in the literature were 0.92, 0.69, and
1.0, respectively.
While the minimum and maximum values were the
smallest and largest values found in the available literature,
we recognized that the true lower bound might be less than
the estimated minimum and the true upper bound might be
greater than the estimated maximum in many cases. In
contrast to Xu et al, who assume that the minimum andmaximum values from the literature correspond to the 0.025
and 0.975 percentiles, we explored the effect of assuming
that the true lower and upper bounds could be obtained
by taking an appropriate offset from the original estimated
minimum and maximum values, where the offset is
expressed as a fraction c of the original estimate of the
range,
a0 maxf0; a cb ag and
b0 minfb cb a; 1g forc40
86 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
7/17
Based on the original estimate of the mode m as well as the
new estimates a0 and b0 of the true minimum and maximum
values, respectively, for each distribution used in the
probabilistic sensitivity analysis, we fitted a beta distribution
using the approximation for the associated shape parameters
given by Equations (7) and (8). In addition, we fitted the
RiskPert version of the beta distribution by assuming that
the mean and variance of the random variable X satisfy the
following equations,
mX a0 4m b0
6and s2X
b0 mXm a0
712
so that the corresponding shape parameters are given by
a1 6mX a
0
b0 a0
and a2 6
b0 mXb0 a0
6 a1 13
(Note that whereas Equations (2) and (3) are always true for
a beta random variable X, Equations (12) and (13) are only
satisfied when X has a RiskPert distribution, which is aspecial type of beta distribution.)
The value for c was varied from 0 to 0.1. Varying c
yielded small changes in the shape parameters for the beta
distributions fitted by each method. However, we found that
the value of c had an effect on the cost-effectiveness
decision; and the effect varied depending on the type of
distribution used for all the probabilities and utilities in the
decision tree. For cA[0, 0.02), there was a significant
difference in the effectiveness of CDMR and TOL (ie, the
95% confidence interval for the mean difference in the utility
between CDMR and TOL did not include zero) when using
beta distributions fitted by each method. For cA[0.02, 0.07],
there was a significant difference in the effectiveness of
CDMR and TOL only when using beta distributions fitted
via Equations (7) and (8). And for c40.07, the difference in
effectiveness of CDMR and TOL was not significant for
either method of fitting beta distributions.
The difference in the effect of c as a function of the
distributional assumptions can be explained by the shapes of
the beta distributions fitted by each method. The p.d.f.s of
the fitted beta distributions for P(Vag) and U(SpVag) are
shown in Figure 2, subfigures 2(a)2(f), for the cases in
which c 0, 0.05, and 0.1. For all the other betadistributions used in this application, similar behaviour
was seen in the superimposed plots of the beta p.d.f. fittedvia Equations (7) and (8) versus the beta p.d.f. fitted via
Equations (12) and (13). While each fitted distribution has
the desired mode in each case, the RiskPert distribution
based on (12) and (13) has fatter tails than those of the p.d.f.
based on (7) and (8); moreover, we see that for the RiskPert
distribution, the variance clearly depends on the mean. As
indicated above, the assumptions about the variance that
underlie Equations (7) and (8) differ substantially from the
assumptions about the mean and variance that underlie the
RiskPert distribution; and these differences lead to different
conclusions about the cost-effectiveness of CDMR com-
pared with TOL when cA[0.02, 0.07].
Remark 2. Several general conclusions emerged from the
foregoing applications to pharmaceutical manufacturing and
medical decision analysis. When input modelling is based on
estimates of the minimum, most likely, and maximum values
of a target random variable, there is often substantial
uncertainty in the estimates of the extreme values; and in
such situations the fitted distribution should generally have
most of its probability concentrated in the vicinity of the
estimated mode, which is much more accurate than the other
two estimates. The generalized beta distribution is usually a
good choice for rapid input modelling in these situations;
and often acceptable results can be obtained using either
Equations (7) and (8) or Equations (12) and (13). In our view
the primary disadvantage of Equations (12) and (13) is that
the variance of the fitted distribution is a function of its
mean. In general the analysis of a simulation-generated
response is complicated by dependence of the variance of theresponse on its mean; and numerous variance-stabilizing
transformations have been proposed to avoid such undesir-
able behaviour (Irizarry et al, 2003). In some types of
applications, it may be necessary to study systematically the
sensitivity of the simulation-generated results to changes in
the assumed values of the mode and variance of each input
random variable; and in this case the development given in
the Appendix can be used to investigate the impact of
independently varying the postulated values of the mode and
variance of the fitted beta distribution.
3. Johnson translation system of distributions
Starting from a continuous random variable X whose
distribution is unknown and is to be approximated and
subsequently sampled, Johnson (1949) proposes the idea of
inferring an appropriate distribution by identifying a suitable
translation (or transformation) of X to a standard normal
random variable Z with mean 0 and variance 1 so that
ZBN(0, 1). The translations have the form
Z g d gX x
l
14
where g and d are shape parameters, l is a scale parameter,
x is a location parameter, and g( ) is a function whose formdefines the four distribution families in the Johnson
translation system,
gy
lny for SL lognormal family
ln y ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
y2 1p
for SUunbounded family
ln y=1 y for SB bounded familyy for SNnormal family
8>>>>>:DeBrota et al (1989a) detail the advantages of the Johnson
translation system of distributions for simulation input
ME Kuhl et alUnivariate input models for stochastic simulation 87
8/3/2019 Univariate Input Models for Stochastic Simulation
8/17
modelling, especially in comparison with the triangular,
beta, and normal distribution families.
3.1. Johnson distribution and density functions
If (14) is an exact normalizing translation ofXto a standard
normal random variable, then the c.d.f. of X is given by
FXx F g d gx x
l
!forall x 2 H
where: (i) Fz 2p1=2Rz
1 exp 12 w
2
dw denotes
the c.d.f. of the N(0, 1) distribution; and (ii) the space H
of X is
H
x; 1 for SL lognormal family
1; 1 for SU unbounded family
x; x l for SB bounded family
1; 1 for SN normalfamily
8>>>>>>>>>:
0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
0.5
1
1.5
2
2.5
0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
0.5
1
1.5
2
2.5
U(SpVag), = 0.05 U(SpVag), = 0.10
0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10
0.5
1
1.5
2
2.5
0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
0.5
1
1.5
2
2.5
P(Vag), = 0.10 U(SpVag ), = 0.0
0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.980
0.5
1
1.5
2
2.5
0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10
0.5
1
1.5
2
2.5
P(Vag), = 0.0 P(Vag), = 0.05
Figure 2 Beta distributions fitted to P(Vag), the probability of vaginal delivery (subfigures 2(a)2(c)) and to U(SpVag), the utility ofspontaneous vaginal delivery (subfigures 2(d)2(f)), where the solid line is the fit using Equations (7) and (8) and the dashed line is theRiskPert fit using (12) and (13).
88 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
9/17
The p.d.f. of X is given by
fXx d
l2p1=2g0
x x
l
exp
1
2g d g
x x
l
!2( )
for all xAH, where
g0y
1=y for SL lognormal family1=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
y2 1p
for SUunbounded family1=y1 y for SB bounded family1 for SNnormal family
8>>>:For graphs illustrating the diversity of distributional shapes
that can be achieved with the Johnson system of univariate
distributions, see DeBrota et al(1989a) or pp 3437 of Kuhl
et al (2008b).
3.2. Fitting Johnson distributions to sample data
The process of fitting a Johnson distribution to sample data
involves first selecting an estimation method and the desired
translation function g( ) and then obtaining estimates of thefour parameters g, d, l, and x. The Johnson translation
system of distributions has the flexibility to match (i) any
feasible combination of values for the mean mX, variance sX2 ,
skewness
SkX EX mX3=s3X
and kurtosis
KuX EX mX4=s4X
or (ii) sample estimates of the moments mX, sX2
, SkX, andKuX. Moreover, in principle the skewness SkX and kurtosis
KuX uniquely identify the appropriate translation function
g( ). Although there are no closed-form expressions for theparameter estimates based on the method of moment
matching, these quantities can be accurately approximated
using the iterative procedure of Hill et al (1976). Other
estimation methods may also be used to fit Johnson
distributions to sample datafor example, in the FITTR1
software package (Swain et al, 1988), the following methods
are available:
OLS and DWLS estimation of the c.d.f.; minimum L1 and LN norm estimation of the c.d.f.; moment matching; and percentile matching.
3.3. Fitting SB distributions to subjective information
DeBrota et al (1989b) discuss VISIFIT, a public-domain
software package for fitting Johnson SB distributions to
subjective information, possibly combined with sample data.
The user must provide estimates of the end-points a and b
together with any two of the following characteristics:
the mode m; the mean mX; the median x0.5;
arbitrary quantile(s) xp or xq for p, qA(0, 1); the width of the central 95% of the distribution; or the standard deviation sX.
3.4. Generating Johnson variates by inversion
After a Johnson distribution has been fitted to a data set,
generating samples from the fitted distribution is straight
forward. First, a standard normal variate ZBN(0, 1) is
generated. Then the corresponding realization of the
Johnson random variable X is found by applying to Z the
inverse translation
X x l g1Z g
d
15
where for all real z we define the inverse translation function
g1z
ez for SLlognormal familyez ez=2 for SUunbounded family1=1 ez for SBbounded familyz for SNnormal family
8>>>: 16Remark 3. Although most popular general-purpose
simulation packages provide an acceptable generator of
standard normal random variables, we are particularly
interested in generating Z by the method of inversion,
ZF1(U), where UBUniform[0, 1] is a random numberand we use the approximation to F1( ) that is availablevia Normaldist of Press et al(2007). Also recommended is
the approximation to F1( ) given in Section 26.2.22 ofAbramowitz and Stegun (1972). As documented in the
companion paper on multivariate input modelling (Kuhl
et al, 2010), an accurate approximation to F1( ) will bea key element in our approach to building multivariate
extensions of the Johnson translation system of distribu-
tions as well as stationary univariate time series whose
marginals are Johnson distributions.
3.5. Application of Johnson distributions to
smart-materials research
Matthews et al (2006), Weiland et al (2005), and Gao and
Weiland (2008) present a multiscale modelling approach for
the prediction of material stiffness of a certain class of smart
materials called ionic polymers. The material stiffness
depends on multiple parameters, including the effective
length of the polymer chains composing the material. In a
case study of Nafion, a specific type of ionic polymer,
ME Kuhl et alUnivariate input models for stochastic simulation 89
8/3/2019 Univariate Input Models for Stochastic Simulation
10/17
Matthews et al (2006) develop a simulation model of the
conformation of Nafion polymer chains on a nanoscopic
level, from which a large number of end-to-end chain lengths
are generated. The p.d.f. of end-to-end distances is then
estimated and used as an input to a macroscopic-level
mathematical model to quantify material stiffness.
Figure 3 shows the empirical distribution of 9980
simulation-generated observations of end-to-end Nafion
chain lengths (in angstro ms). Superimposed on the empirical
distribution is the result of using the DWLS estimation
method to fit an unbounded Johnson (SU) distribution to the
chain length data. Figure 3 reveals a remarkably accurate fit
to the given data set. Furthermore, comparing the Johnson
fit in Figure 3 with the beta fits for the same data set in
Figure 1, we see that the Johnson distribution is able to
capture certain key aspects of the Nafion data set that the
beta distribution is unable to represent adequately.
Gao and Weiland (2008), Matthews et al (2006), and
Weiland et al (2005) conclude that the estimates of the
distribution of chain lengths obtained by fitting an appro-priate Johnson distribution to the data are more intuitive
than those using other density estimation techniques for the
following reasons. First, it is possible to write down an
explicit functional form for the Johnson p.d.f. fX(x) that is
simple to differentiate. This is a crucial property because the
second derivative fX0 0
(x) of the p.d.f. will be used as an input
to a mathematical model to estimate material stiffness.
Second, there is a relatively simple relationship between the
Johnson parameters and the material stiffness. Weiland et al
(2005) summarize the results of a sensitivity analysis for the
Johnson parameters and the corresponding effect on
material stiffness. In general, Weiland et al find that
increasing the location parameter x leads to an increase in
predicted stiffness. Similarly, increasing the shape parameter
d or decreasing the scale parameter l both lead to marginally
higher predicted material stiffness. Establishing a consistent
relationship between these parameters and stiffness would
first serve to extend the current theory to stiffness predic-
tions, and may ultimately also serve as a step toward the
custom design of materials with specific stiffness properties.
3.6. Application of Johnson distributions to health-care
systems analysis
In a recent study of the arrival patterns of patients who have
scheduled appointments at a community health-care clinic,
Alexopoulos et al (2008) find that patient tardiness (ie, the
patients deviation from the scheduled appointment time) is
most accurately modelled using an SU distribution. Specifi-cally they consider data on patient tardiness collected by the
Partnership of Immunization Providers, a collaborative
public-private project created by the University of California,
San Diego School of Medicine, Division of Community
Pediatrics, in association with community clinics and small,
private provider practices. Alexopoulos et al(2008) perform
an exhaustive analysis of 18 continuous distributions, and
they conclude that the SU distribution provides superior fits
to the available data.
4. Be zier distribution family
4.1. Definition of Bezier curves
In computer graphics, a Be zier curve is often used to
approximate a smooth (continuously differentiable) function
on a bounded interval by forcing the Bezier curve to pass
in the vicinity of selected control points {pi(xi, zi)T:
i 0,1,y, n} in two-dimensional Euclidean space. (Through-out this article, all vectors will be column vectors unless
otherwise stated; and the roman superscript T will denote the
transpose of a vector or matrix.) Formally, a Be zier curve of
degree n with control points {p0, p1,y, pn} is given
parametrically by
Pt Xni0
Bn;itpi for t 2 0; 1 17
where the blending function Bn,i(t) (for all tA[0,1]) is the
Bernstein polynomial
Bn;it n!
i!n i!ti1 tnifor i 0; 1; . . . ; n 18
10 0 10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10 0 10 20 30 40 50 60 70 80 900
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Figure 3 Johnson SU c.d.f. (left panel) and p.d.f. (right panel) fitted to 9980 Nafion chain lengths.
90 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
11/17
4.2. Bezier distribution and density functions
If X is a continuous random variable whose space is the
bounded interval [a, b] and if X has c.d.f. FX( ), and p.d.f.fX( ), then in principle we can approximate FX( ) arbitrarilyclosely using a Be zier curve of the form (17) by taking a
sufficient number (n 1) of control points with appropriate
values for the coordinates (xi, zi)T
of the ith control pointpi for i 0,y, n. If X is a Be zier random variable, then thec.d.f. of X is given parametrically by
Pt fxt; FXxtgT
for t 2 0; 1 19
where
xt Pni0
Bn;itxi
FXxt Pni0
Bn;itzi
9>>=>>; 20Equation (20) reveals that the control points p0, p1,y, pnconstitute the parameters regulating all the properties of
a Be zier distribution. Thus the control points must be
arranged so as to ensure the basic requirements of a c.d.f.: (i)
FX(x) is monotonically nondecreasing in the cutoff value x;
(ii) FX(a) 0; and (iii) FX(b) 1. By utilizing the Be zierproperty that the curve described by (19)(20) passes
through the control points p0 and pn exactly, we can ensure
that FX(a) 0 if we take p0 (a, 0)T; and we can ensure that
FX(b) 1 if we take pn (b,1)T. See Wagner and Wilson
(1996a) for a complete discussion of univariate Be zier
distributions and their use in simulation input modelling.
If X is a Be zier random variable with c.d.f. FX( ) given
parametrically by (19), then it follows that the correspondingp.d.f. fX(x) for all real x is given parametrically by
Pt fxt;fXxtgT
for t 2 0; 1
where x(t) is given by (20) and
fXxt
Pn1i0
Bn1;itDzi
Pn1i0
Bn1;itDxi
In the last equation, Dxi xi 1xi and Dzi zi 1zi (for
i 0,1,y, n1) represent the corresponding first differencesof the x- and z-coordinates of the original control points
{p0, p1,y, pn} in the parametric representation (19) of the
c.d.f.
4.3. Generating Bezier variates by inversion
The method of inversion can be used to generate a Be zier
random variable whose c.d.f. has the parametric representa-
tion displayed in Equations (19) and (20). Given a random
number UBUniform [0, 1], we perform the following steps:
(i) find tUA[0, 1] such that
Xni0
Bn;itUzi U 21
and (ii) deliver the variate
XXni0
Bn;itUxi 22
The solution to (21) can be computed by any root-finding
algorithm such as Mu llers method, Newtons method, or
the bisection method. Codes to implement this approach to
generating Be zier variates are available on Web site
www.ise.ncsu.edu/jwilson/page3.
Remark 4. As documented in the companion paper on
multivariate input modelling (Kuhl et al, 2010), the inversion
scheme specified in Equations (21) and (22) for generating
Be zier random variables will be a key element in ourapproach to building multivariate extensions of the uni-
variate Be zier distributions as well as stationary univariate
time series whose marginals are Be zier distributions.
4.4. Using PRIME to model Bezier distributions
PRIME is a graphical, interactive software system that
incorporates the methodology detailed in this section to help
an analyst estimate the univariate input processes arising in
simulation studies. PRIME is written entirely in the C
programming language, and it has been developed to run
under Microsoft Windows. A public-domain version of the
software is available on the previously mentioned Web site.
PRIME is designed to be easy and intuitive to use. The
construction of a c.d.f. is performed through the actions of
the mouse, and several options are conveniently available
through menu selections. Control points are represented as
small black squares, and each control point is given a unique
label corresponding to its index i in Equation (17). Figure 4
shows a typical session in PRIME, where the c.d.f. and p.d.f.
windows are both displayed.
In the absence of data, PRIME can be used to model an
input process conceptualized from subjective information or
expertise. Section 5.1 of Wagner and Wilson (1996a)
contains a detailed example of the interactive use of PRIMEfor subjective input modelling; here we merely provide an
overview of this approach to using PRIME. The representa-
tion of the conceptualized distribution is achieved by adding,
deleting, and moving the control points via the mouse. Each
control point acts like a magnet that pulls the curve in the
direction of the control point, where the blending functions
(ie, the Bernstein polynomials defined by Equation (18))
govern the strength of the magnetic attraction exerted on
the curve by each control point. Clicking (ie, selecting) and
dragging (ie, moving) a control point causes the displayed
ME Kuhl et alUnivariate input models for stochastic simulation 91
8/3/2019 Univariate Input Models for Stochastic Simulation
12/17
c.d.f. to be updated (nearly) instantaneously. If they are
displayed, the corresponding p.d.f., the first four moments(that is, the mean, variance, skewness, and kurtosis), and
selected percentile values of the Be zier distribution are
updated (nearly) simultaneously in adjacent windows so that
the user gets immediate feedback on the effects of moving
selected control points. Thus, the user has a variety of
readily available indicators and measures, as well as visually
appealing displays, to aid in the construction of the
conceptualized distribution.
As detailed in Wagner and Wilson (1996a, b), PRIME
includes several standard estimation procedures for fitting
distributions to sample data sets:
OLS estimation of the c.d.f.; minimum L1 and LN norm estimation of the c.d.f.; maximum likelihood estimation (assuming a and b are
known);
moment matching; and percentile matching.
Figure 5 shows a Be zier distribution that was fitted to the
same data set consisting of Nafion polymer chain lengths as
shown in Figure 3. In this application of PRIME, we
obtained the fitted Be zier distribution automatically, where:
(i) the number of control points (n 1) was determined bythe likelihood ratio test detailed in Wagner and Wilson
(1996b); and (ii) the components of the control points were
estimated by the method of OLS. Figure 5 shows that
a Be zier distribution yielded an excellent fit to the given
data set.
As another example that illustrates the capability of
PRIME and the Be zier distribution family to handle
multimodal data, we describe briefly an input-modelling
problem that arose in a manufacturing simulation study. For
more details on this application using an earlier version of
PRIME that did not incorporate automatic determination of
the number of control points to be used in the fittedBe zier distribution, see Section 5.2 of Wagner and Wilson
(1996a). Surface mount capacitors were stored in lots of
varying sizes in a facility adjacent to the insulation resistance
(IR) testing area. To model the operation of the IR testing
area, we needed to estimate the distribution of capacitor lot
sizes in the storage facility.
Capacitor lot-size data were available for 2083 tested lots.
The left-hand panel of Figure 6 displays the empirical c.d.f.
for this data set and the final fitted Be zier c.d.f.; and the
right-hand panel displays a histogram and the final fitted
Be zier p.d.f., where all of the original observations were
divided by 1000 for simplicity. Notice that in the vicinity of
20 and 270 on the new scale (that is, lot sizes expressed in
1000s), there are pronounced peaks in the histogram. Usually
such a bimodal distribution indicates that the sample was
taken from two distinct distributions that must be fitted
separately so that the overall fitted distribution is a mixture
of the two component distributions; for an elaboration of this
point, see Remark 5 below. However in the current context,
the production engineers were unable to provide any addi-
tional information that would have enabled us to model
the lot-size distribution as a mixture of two simpler distri-
butions; and thus we were forced to exploit the capabilities
of PRIME for modelling multimodal distributions.
The fitted Be zier distribution displayed in Figure 6 wasobtained in two steps using the method of OLS. First we
simply used the default settings of PRIME to fit a Be zier
distribution with six control points; and the resulting fit was
unimodal and was judged to be unsatisfactory based on
visual inspection of the fitted p.d.f. and c.d.f. (As detailed in
Wagner and Wilson (1996a), several other widely used
commercial input-modelling packages also yielded unsatis-
factory fits to this data set precisely because they do not
include any distribution families that can adequately handle
multimodal data sets.) In the second step of using PRIME to
Figure 4 PRIME windows showing the Be zier c.d.f. (left panel) with its control points and the p.d.f. (right panel).
92 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
13/17
fit a Be zier distribution to the lot-size data set, we used the
option for automatic determination of the number of control
points starting from the current configuration. As shown in
Figure 6, the final fitted Be zier distribution had 13 control
points; and the fitted p.d.f. and c.d.f. closely approximated
the corresponding histogram and empirical c.d.f. for the lot-
size data set.
Remark 5. If a data set has two or more clearly
distinguishable sources each with its own distribution, then
an alternative approach to fitting a multimodal distribution
to the overall data set is to represent the corresponding
c.d.f. (or p.d.f.) as a mixture of the c.d.f.s (or p.d.f.s) for
the individual sources, where the mixing probabilities are
the associated long-run percentages of the overall data set
Figure 5 Be zier distribution fitted to 9980 Nafion chain lengths.
Figure 6 Be zier distribution fitted to capacitor lot-size data set of size 2083.
ME Kuhl et alUnivariate input models for stochastic simulation 93
8/3/2019 Univariate Input Models for Stochastic Simulation
14/17
obtained from each source; see Section 8.2.2 of Law (2007).
In this situation it is natural to fit a distribution to the
subsample from each source separately; and then the
corresponding estimate of the mixing probability is simply
the fraction of the entire data set obtained from the relevant
source. This approach could not be used in the lot-size
application described above because separate sources of data
could not be identified.
The Be zier distribution family, which is entirely specified
by its control points {p0, p1,y, pn}, has the following
advantages:
It is extremely flexible and can represent a wide diversityof distributional shapes. For instance, Figures 4 and 6
depict multimodal distributions that are easily constructed
using PRIME, yet impossible to achieve with other
distribution families.
If data are available, then the likelihood ratio test of
Wagner and Wilson (1996b) can be used in conjunctionwith any of the estimation methods enumerated above to
find automatically both the number and location of the
control points.
In the absence of data, PRIME can be used to determinethe conceptualized distribution based on known quanti-
tative or qualitative information that the user perceives to
be pertinent.
As the number (n 1) of control points increases, so doesthe flexibility in fitting Be zier distributions. The inter-
pretation and complexity of the control points, however,
does not change with the number of control points.
5. Conclusions and recommendations
The common thread running through this article is the focus
on robust input models that are computationally tractable
and sufficiently flexible to represent adequately many of the
probabilistic phenomena that arise in many applications of
discrete-event stochastic simulation. For another approach
to input modelling with no data, see Craney and White
(2004).
The emphasis in this article has been on the beta, Johnson,
and Be zier families because of their flexibility and because
we have found that in practice, they can be most effectivelyapplied to simulation projects in which a large number of
input models must be built under conditions in which the
user lacks either of the following: (i) detailed information
about the mechanism generating the target inputs; or (ii) the
time to gather the information specified in (i) and use that
information to derive the precise functional form of the
relevant distribution. For situations in which the user has
more information about the genesis of the continuous
univariate distribution to be modelled, we have found the
Pearson system of distributions can often be used effectively;
see Chapter 4 of Elderton and Johnson (1969) and Sections
6.26.13 of Stuart and Ord (1994). Johnson et al(1994, 2004)
provide a comprehensive discussion of continuous univariate
distributions; see also Kotz and van Dorp (2004). For a
similar treatment of discrete univariate distributions, see
Johnson et al (2005).
Notably missing from this article is a discussion of
Bayesian techniques for simulation input modelling, a topic
that we think will receive increasing attention from
practitioners and researchers alike in the future. In selecting
the input models for a simulation, we must account for three
main sources of uncertainty:
1. Stochastic uncertainty arises from dependence of the
simulation output on the random numbers generated and
used on each runfor example, the random number U
used in generate a generalized beta random variable Xvia
Equation (11).
2. Model uncertainty arises when the correct input model isunknown, and we must choose between alternative input
models with different functional forms that adequately fit
available sample data or subjective informationfor
example, the generalized beta, Johnson SU, and Be zier
distributions fitted to the Nafion data set as depicted in
Figures 1, 3 and 5, respectively.
3. Parameter uncertainty arises when the parameters of the
selected input model(s) are unknown and must be
estimated from sample data or subjective information.
Although stochastic uncertainty is much more widely
recognized by simulation practitioners than the other twotypes of uncertainty, it is not always a major source of
variation in simulation output as demonstrated by Zouaoui
and Wilson (2004) using an M/G/1 queueing system
simulation in which stochastic uncertainty accounts for only
2% of the posterior variance of the average waiting time in
the queue, while model uncertainty regarding the exact
functional form of the service-time distribution accounts for
18% of the posterior varianceand thus 80% of the
posterior variance is due to uncertainty regarding the exact
numerical values of the arrival rate and the parameters of the
service-time distribution. In such a situation, conventional
approaches to input modelling have the potential to yield a
grossly misleading picture of the inherent accuracy of
simulation-generated system performance measures such as
the average queue waiting time. For an introduction to
Bayesian input modelling, see Chick (1999, 2001) and
Zouaoui and Wilson (2003, 2004).
Another topic not discussed in this article is the use of
heavy-tailed distributions in simulation input modelling. If
the random variable X has a heavy-tailed distribution, then
1 FXx PrfX4xg $ cxa as x ! 1 23
94 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
15/17
where c40 is a location parameter, a is a shape parameter
with aA(1,2), and B means that the ratio of the left- and
right-hand sides of (23) tends to 1 as x-N. Heavy-tailed
distributions frequently arise in simulations of computer and
communications systems (Crovella and Lipsky, 1997;
Greiner et al, 1999; Heyde and Kou, 2004). Fishman and
Adan (2005) discuss some situations in which the lognormal
distribution (a member of the Johnson translation system)
can provide a reasonable substitute for a heavy-tailed
distribution.
Additional material on techniques for simulation input
modelling will be posted to the Web site http://www.ise
.ncsu.edu/jwilson/more_info.
AcknowledgementsPartial support for some of the research describedin this article was provided by National Science Foundation GrantDMI-9900164.
References
AbouRizk SM, Halpin DW and Wilson JR (1991). Visual
interactive fitting of beta distributions. J Constr Eng Mngt
117: 589605.
AbouRizk SM, Halpin DW and Wilson JR (1994). Fitting beta
distributions based on sample data. J Constr Eng Mngt 120:
288305.
Abramowitz M and Stegun IA (1972). Handbook of Mathematical
Functions with Formulas, Graphs, and Mathematical Tables.
Dover: New York.
Akaike H (1974). A new look at the statistical model identification.
IEEE T Automat Contr AC-19: 716723.
Alexopoulos C et al (2008). Modeling patient arrival times in
community clinics. Omega 36: 3343.Chick SE (1999). Steps to implement Bayesian input distribution
selection. In: Farrington PA, Nembhard HB, Sturrock DT and
Evans GW (eds). Proceedings of the 1999 Winter Simulation
Conference. Institute of Electrical and Electronics Engineers:
Piscataway, NJ, pp 317324, http://www.informs-sim.org/
wsc99papers/044.PDF, accessed 28 March 2009.
Chick SE (2001). Input distribution selection for simulation
experiments: Accounting for input uncertainty. Opns Res 49:
744758.
Craney TA and White N (2004). Distribution selection with no data
using VBA and Excel. Qual Eng 16: 643656.
Crovella ME and Lipsky L (1997). Long-lasting transient
conditions in simulations with heavy-tailed workloads. In:
Andradottir S, Healy KJ, Withers DH and Nelson BL (eds).
Proceedings of the 1997 Winter Simulation Conference. Instituteof Electrical and Electronics Engineers: Piscataway, NJ,
pp 10051012, http://www.informs-sim.org/wsc97papers/1005
.PDF, accessed 8 July 2009.
DeBrota DJ et al (1989a). Modeling input processes with
Johnson distributions. In: MacNair EA, Musselman KJ and
Heidelberger P (eds). Proceedings of the 1989 Winter Simulation
Conference. Institute of Electrical and Electronics Engineers:
Piscataway, NJ, pp 308318, http://www.ise.ncsu.edu/jwilson/
files/debrota89wsc.pdf, accessed 28 March 2009.
DeBrota DJ, Dittus RS, Roberts SD and Wilson JR (1989b). Visual
interactive fitting of bounded Johnson distributions. Simulation
52: 199205.
Dickson LE (1939). New First Course in the Theory of Equations .
Wiley: New York.
Elderton WP and Johnson NL (1969). Systems of Frequency Curves.
Cambridge University Press: Cambridge.
Fishman GS and Adan IJB (2005). How heavy-tailed distributions
affect simulation-generated time averages. ACM Trans Model
Comput Simul 16: 152173.
Gao F and Weiland LM (2008). A multiscale model applied to ionic
polymer stiffness prediction. J Mater Res 23: 833841.Gold MR, Siegel JE, Russell LB and Weinstein MC (1996).
Cost-effectiveness in Health and Medicine. Oxford University
Press: New York.
Greiner M, Jobmann M and Lipsky L (1999). The importance of
power-tail distributions for modeling queueing systems. Opns
Res 47: 313326.
Hahn GJ and Shapiro SS (1967). Statistical Models in Engineering.
Wiley: New York.
Heyde CC and Kou SG (2004). On the controversy over tailweight
distributions. Opns Res Lett 32: 399408.
Hill ID, Hill R and Holder RL (1976). Algorithm AS99: Fitting
Johnson curves by moments. Appl Stat 25: 180189.
Irizarry MA et al (2003). Analyzing transformation-based simula-
tion metamodels. IIE Trans 35: 271283.
Johnson NL (1949). Systems of frequency curves generated by
methods of translation. Biometrika 36: 149176.
Johnson NL, Kemp AW and Kotz S (2005). Univariate Discrete
Distributions, 3rd edn, Wiley-Interscience: New York.
Johnson NL, Kotz S and Balakrishnan N (1994). Continuous
Univariate Distributions, Vol. 1, 2nd edn, Wiley-Interscience:
New York.
Johnson NL, Kotz S and Balakrishnan N (2004). Continuous
Univariate Distributions, Vol. 2, 2nd edn, Wiley-Interscience:
New York.
Kotz S and van Dorp JR (2004). Beyond Beta: Other Continuous
Families of Distributions with Bounded Support and Applications.
World Scientific: Singapore.
Kuhl ME et al (2006). Introduction to modeling and generating
probabilistic input processes for simulation. In: Perrone LF,et al. (eds). Proceedings of the 2006 Winter Simulation
Conference. Institute of Electrical and Electronics Engineers:
Piscataway, NJ, pp 1935, http://www.informs-sim.org/
wsc06papers/003.pdf, accessed 28 March 2009.
Kuhl ME et al (2008a). Introduction to modeling and generating
probabilistic input processes for simulation. In: Mason SJ, et al.
(eds). Proceedings of the 2008 Winter Simulation Conference.
Institute of Electrical and Electronics Engineers: Piscataway,
NJ, pp 4861, http://www.informs-sim.org/wsc08papers/
008.pdf, accessed 28 March 2009.
Kuhl ME et al (2008b). Introduction to modeling and generating
probabilistic input processes for simulation. Slides accom-
panying the oral presentation of Kuhl et al (2008a), http://
www.ise.ncsu.edu/jwilson/files/wsc08imt.pdf, accessed 28 March
2009.Kuhl ME et al (2010). Multivariate input models for stochastic
simulation. J Simul (in preparation).
Law AM (2007). Simulation Modeling and Analysis 4th edn,
McGraw-Hill: New York.
Matthews JL et al (2006). Monte Carlo simulation of a solvated
ionic polymer with cluster morphology. Smart Mater Struct 15:
187199.
McBride WJ and McClelland CW (1967). PERT and the beta
distribution. IEEE Trans Eng Mngt EM-14: 166169.
Palisade Corp (2009). Getting started in @RISK. Palisade
Corp.: Ithaca, NY, http://www.palisade.com/risk/5/tips/EN/gs/,
accessed 5 July 2009.
ME Kuhl et alUnivariate input models for stochastic simulation 95
8/3/2019 Univariate Input Models for Stochastic Simulation
16/17
Pearlswig DM (1995). Simulation modeling applied to the single pot
processing of effervescent tablets. Masters thesis, Integrated
Manufacturing Systems Engineering Institute, North Carolina
State University, Raleigh, NC, http://www.ise.ncsu.edu/jwilson/
files/pearlswig95.pdf, accessed 28 March 2009.
Press WH, Teukolsky SA, Vetterling WT and Flannery BP (2007).
Numerical Recipes: The Art of Scientific Computing, 3rd edn.
Cambridge University Press: Cambridge.
SAS Institute Inc (2008). JMP 8 Statistics and Graphics Guide.http://www.jmp.com/support/downloads/pdf/jmp8/jmp_stat_
graph_guide.pdf, accessed 28 October 2009.
Stephens MA (1974). EDF statistics for goodness of fit and some
comparisons. J Am Stat Assoc 69: 730737.
Stuart A and Ord K (1994). Kendalls Advanced Theory of Statistics,
Volume 1: Distribution Theory, 6th edn, Edward Arnold: London.
Swain JJ, Venkatraman S and Wilson JR (1988). Least-squares
estimation of distribution functions in Johnsons translation
system. J Stat Comput Simul 29: 271297.
Vanhoucke M (2010). Using activity and sensitivity and network
topology information to monitor project time performance.
Omega (forthcoming).
Wagner MAF and Wilson JR (1996a). Using univariate Be zier
distributions to model simulation input processes. IIE Trans 28:
699711.
Wagner MAF and Wilson JR (1996b). Recent developments in
input modeling with Bezier distributions. In: Charnes JM,
Morrice DJ, Brunner DT and Swain JJ (eds). Proceedings of the
1996 Winter Simulation Conference. Institute of Electrical
and Electronics Engineers: Piscataway, NJ, pp 14481456,
http://www.ise.ncsu.edu/jwilson/files/wagner96wsc.pdf, accessed
28 March 2009.
Weiland LM, Lada EK, Smith RC and Leo DJ (2005). Application
of rotational isomeric state theory to ionic polymer stiffness
predictions. J Mater Res 20: 24432455.
Wilson JR, Vaughan DK, Naylor E and Voss RG (1982). Analysis
of Space Shuttle ground operations. Simulation 38: 187203.
Xu X et al (2010). Pelvic floor consequences of cesarean delivery
on maternal request in women with a single birth: A cost-effectiveness analysis. J Womens Health 19: 147160.
Zouaoui F and Wilson JR (2003). Accounting for parameter uncer-
tainty in simulation input modeling. IIE Trans 35: 781792.
Zouaoui F and Wilson JR (2004). Accounting for input-model
and input-parameter uncertainties in simulation. IIE Trans 36:
11351151.
Appendix
Exact computation of shape parameters for beta
distribution fitted to user-specified mode and variance
To simplify the notation in this appendix, we let a, m,and b denote the user-specified minimum, mode, and
maximum of the target distribution with aob and mA[a, b]
as if these quantities were known exactly; in practice of
course it is often necessary to use estimates ba, bm, and bbof these quantities in the following development. In
this appendix, we provide exact computing formulas
for the shape parameters a1 and a2 of the generalized
beta distribution (1) on the interval [a, b] that has the
user-specified mode m and the user-specified variance
sX2 (ba)2/o.
If o412 (so that the desired beta distribution has a
smaller variance than that of the uniform distribution on the
interval [a, b]), then for any value of mA[a, b], there is a
unique generalized beta distribution on [a, b] with a unique
mode at m. (Ifo 12, then it can be shown that we musthave a1 a2 1 so that the beta distribution with the givenmode and variance coincides with the uniform distribution
on [a, b]. Since the mode is assumed to be unique, this
uninteresting case is eliminated from further consideration.)
If we set the right-hand side of (4) equal to m and the right-
hand side of (3) equal to (ba)2/o, then we obtain thefollowing equivalent system of equations in terms of the
asymmetry ratio r (bm)/(ma), provided m4a so thatroN:
a31 Ba21 Ca1 D 0
a2 ra1 1 r
'A1
where
B 3r3 2r2 5 or 41 r3
C 3r3 5r2 o 3r 5 o
1 r3
D r3 4r2 5r 2
1 r3
9>>>>>=>>>>>;A2
Remark 6. In the case that m a so that r N, we solvethe mirror image problem for which m b and r 0; andthen we interchange the resulting shape parameters to obtain
a generalized beta distribution whose mode coincides with its
minimum. See also Remark 7 below.
It can be proved that ifo412, then for all rA[0,N] thecubic equation in a1 defined by (A1)(A2) has a nonnegative
discriminant
D 18BCD 4B3D B2C2 4C3 27D2
so that the cubic equation has three real roots {zj:j 1,2,3}such that:
z141
z2; z3o1
'A3
As possible values ofa1, the roots z2 and z3 are unacceptable
for the following reasons:
(i) The assignment a1A(0, 1) yields a generalized beta
distribution with an asymptote at its lower limit a,
which seems intuitively problematic and is clearly
unacceptable when the user-specified mode m exceeds
the lower limit.
(ii) The assignment a1p0 does not define a legitimate
generalized beta distribution.
We are therefore left with the unique assignment a1 z1;and a computing formula for a1 can be derived from the
96 Journal of Simulation Vol. 4, No. 2
8/3/2019 Univariate Input Models for Stochastic Simulation
17/17
explicit solution to a cubic equation as follows (see Sections
3338 of Dickson, 1939). In terms of the auxiliary quantities
P C 13B
2
Q D 13BC 2
27B3
'A4
we have
a1 z1
43P
1=2cos 1
3cos1 12Q
3P
3=2n o
13B; ifD40
B; ifD 0
(A5
.Finally we take a2 ra1 1r to complete the specificationof the generalized beta distribution.
Remark 7. In general to avoid numerical difficulties that
can occur with large values of r (that is, when r ) 1),we recommend the following approach to the use of
Equations (A1)(A5). If (bm)/(ma)41, then we solvethe mirror image problem for which r (ma)/(bm)o1;and finally we interchange the resulting shape parameters to
obtain a generalized beta distribution with the user-specified
mode m.
Received 13 July 2009;
accepted 9 November 2009 after one revision
ME Kuhl et alUnivariate input models for stochastic simulation 97