View
223
Download
2
Category
Preview:
Citation preview
Introduction to Modeling and GeneratingProbabilistic Input Processes for Simulation
Michael E. Kuhl Emily K. LadaRIT SAS Institute Inc.
Natalie M. Steiger Mary Ann WagnerUniversity of Maine SAIC
James R. WilsonNC State University
<www.ise.ncsu.edu/jwilson>
December 11, 2007
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
OVERVIEW
I. Introduction
II. Univariate Input Models
A. Generalized Beta Distribution Family
B. Johnson Translation System of Distributions
C. Bézier Distribution Family
III. Time-Dependent Arrival Processes
IV. Conclusions and Recommendations
2
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
I. Introduction
• Stochastic simulations require valid input models—e.g., probabilitydistributions that accurately mimic the random input processes drivingthe target system.
3
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Problems in using many conventional probability models:
1. They cannot adequately represent real-world behavior, e.g. in the tailsof the underlying distribution.
2. Parameter estimation based on sample data or subjective information(expert opinion) is often troublesome.
3. Fine-tuning the fitted model is difficult; e.g., many conventionalprobability distributions have the following drawbacks—
(a) A limited number of parameters available to control the fitteddistribution, and
(b) No effective mechanism for directly manipulating the fitteddistribution while simultaneously updating its parameter estimates.
4
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Conventional approach to identifying an input model uses sample datato select from a list of well-known alternatives based on
1. informal graphical techniques such as probability plots, Q–Q plots,histograms, empirical frequency distributions, or box-plots; and
2. statistical goodness-of-fit tests such as the Kolmogorov-Smirnov,chi-squared, and Anderson-Darling tests.
5
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Drawbacks of conventional input modeling
1. Visual comparison of a histogram to a fitted probability densityfunction (p.d.f.) depends on the (arbitrary) layout of the histogram.
2. Problems with statistical goodness-of-fit tests include:
(a) In small samples, low power to detect lack of fit results in aninability to reject any alternatives.
(b) In large samples, practically insignificant fit discrepancies result inrejection of all alternatives.
6
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Problems in estimating the parameters of the selected input modelfrom sample data:
– Matching the mean and standard deviation of the fitted distributionwith that of the sample often fails to capture relevant shapecharacteristics.
– Some estimation methods, such as maximum likelihood andpercentile matching, may simply fail to estimate some parameters.
– Users lack a comprehensive basis for selecting the “best-fitting”model.
7
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Problems with parameter estimation based on subjective information(expert opinion):
– Subjective estimates of moments such as the mean and standarddeviation can be unreliable and depend critically on the units ofmeasurement.
– Subjective estimates of extreme quantiles (e.g., lower and upperlimits of the fitted distribution) are unreliable.
• Practitioners lack definitive procedures for identifying and estimatingvalid input models; thus, output analysis is often based on incorrectinput processes.
• We focus on methods for input modeling that alleviate many of theseproblems.
8
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
II. Univariate Input Models
A. Generalized Beta Distribution Family
If X is a continuous random variable with lower limit a and upper limit b
whose distribution is to be approximated and subsequently sampled ina simulation, then often we can model the behavior of X using ageneralized beta distribution.
• Generalized beta p.d.f.
fX(x) = (1)
�(θ1 + θ2)
�(θ1)�(θ2)(b − a)θ1+θ2−1(x − a)θ1−1(b − x)θ2−1 for a ≤ x ≤ b,
where �(z) = ∫∞0 tz−1e−t dt (for z > 0) denotes the gamma function.
9
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• The beta p.d.f. can accommodate a wide variety of shapes,including
� symmetric and positively or negatively skewed unimodal p.d.f.’s;� J - and U -shaped p.d.f.’s;� left- and right-triangular p.d.f.’s; and� uniform p.d.f.’s.
• Some examples illustrating the range of distributional shapesachievable with the beta p.d.f. follow.
10
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Positively and Negatively Skewed Unimodal Beta Densities
11
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
U -shaped Beta Densities
12
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
J -shaped and Left-triangular Beta Densities
13
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Symmetric and Uniform Beta Densities
14
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Cumulative distribution function (c.d.f.) of beta variate X,
FX(x) = Pr{X ≤ x} =∫ x
−∞fX(w) dw for all real x,
has no convenient analytical expression.
15
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Mean and variance of X are given by
µX = E[X] = θ1b + θ2a
θ1 + θ2,
σ 2X = E
[(X − µX
)2] = (b − a)2θ1θ2
(θ1 + θ2)2(θ1 + θ2 + 1)
.
⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭(2)
• Provided θ1, θ2 > 1 so that the p.d.f. (1) is unimodal, the mode isgiven by
m = (θ1 − 1)b + (θ2 − 1)a
θ1 + θ2 − 2. (3)
• The key distributional characteristics (2) and (3) are simplefunctions of a, b, θ1, and θ2; and this facilitates rapid input modeling.
16
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Fitting Beta Distributions to Data or Subjective Information
Given the data set {Xi : i = 1, . . . , n} of size n, we let
X(1) ≤ X(2) ≤ · · · ≤ X(n)
denote the order statistics; and we compute the sample statistics
a = 2X(1) − X(2), b = 2X(n) − X(n−1),
X = n−1n∑
i=1
Xi, S2 = (n − 1)−1n∑
i=1
(Xi − X
)2.
⎫⎪⎪⎪⎬⎪⎪⎪⎭
17
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Moment-matching estimates of θ1, θ2 are computed from
θ1 = d21 (1 − d1)
d22
− d1, θ2 = d1(1 − d1)2
d22
− (1 − d1),
where
d1 = X − a
b − aand d2 = S
b − a.
18
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• BetaFit (AbouRizk, Halpin, and Wilson 1994) is a Windows-basedpackage for fitting the beta distribution to sample data by computing a,b, θ1, and θ2 using the following estimation methods:
– moment matching;
– feasibility-constrained moment matching (so that the feasibilityconditions a < X(1) and X(n) < b are always satisfied);
– maximum likelihood (assuming a and b are known and thus are notestimated); and
– ordinary least squares (OLS) and diagonally weighted least squares(DWLS) estimation of the c.d.f.
• BetaFit is in the public domain and is available on the Web via
<www.ise.ncsu.edu/jwilson/page3>.
19
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Application of BetaFit to a Sample of n = 9,980 Observations ofEnd-to-End Chain Lengths (in Angströms) of Nafion, an Ionic PolymerUsed As a “Smart Material,” Based on the Method of Moment Matching
20
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
21
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Result of Applying BetaFit to Nafion Data Set Using Maximum LikelihoodEstimation
22
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
23
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Result of Applying BetaFit to Nafion Data Set Using Ordinary LeastSquares Estimation of the C.d.f.
24
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
25
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Rapid input modeling with subjective estimates a, m, and b of theminimum, mode, and maximum, respectively, of the target distribution:
θ1 = d2 + 3d + 4
d2 + 1and θ2 = 4d2 + 3d + 1
d2 + 1, (4)
where
d = b − m
m − a.
The mode of the fitted beta distribution will differ from m by at most4.4%; in practice the error is usually at most 1%.
26
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• VIBES (AbouRizk, Halpin, and Wilson 1991) is a Windows-basedpackage for fitting the beta distribution to subjective estimates of:
1. the endpoints a and b; and
2. any of the following combinations of distributional characteristics—
� the mean µX and the variance σ 2X,
� the mean µX and the mode m,� the mode m and the variance σ 2
X,� the mode m and an arbitrary quantile xp = F−1
X (p) for p ∈ (0, 1),or
� two quantiles xp and xq for p, q ∈ (0, 1).
27
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Advantages of the beta distribution as an input-modeling tool:
– sufficient flexibility to represent with reasonable accuracy a widediversity of distributional shapes; and
– convenient estimation of parameters from sample data or subjectiveinformation.
• Disadvantages of the beta distribution as an input-modeling tool:
– difficult to explain; and
– difficult to sample—some popular beta variate generators breakdown when θ1 > 10 or θ2 > 10.
28
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Application of Beta Distributions to Pharmaceutical Manufacturing
Pearlswig (1995) developed a simulation of a proposed facility formanufacturing effervescent tablets.
• For each operation, he obtained three time estimates(a, m, and b
)from the process engineers.
• Extremely conservative estimates given for upper limits (so that b � m).
• With triangular distributions to model processing times, bottlenecksresulted in excessively low simulation estimates of annual production.
• Using (4), Pearlswig fitted beta distributions to all operation times; andthen the simulation results conformed to production levels of similarplants elsewhere.
29
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
B. Johnson Translation System of Distributions
To fit a distribution to the continuous random variable X, Johnson(1949a) proposed finding a “translation” of X to a standard normalrandom variable Z with mean 0 and variance 1 so that Z ∼ N(0, 1).
For a detailed discussion of the Johnson translation system, see
DeBrota, D. J., R. S. Dittus, S. D. Roberts, J. R. Wilson, J. J. Swain, andS. Venkatraman. 1989a. Modeling input processes with Johnson dis-tributions. In Proceedings of the 1989 Winter Simulation Conference,pp. 308–318. Available online via
<www.ise.ncsu.edu/jwilson/files/wsc89jnsn.pdf>.
30
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
The proposed normalizing translations have the general form
Z = γ + δ · g
(X − ξ
λ
), (5)
where γ and δ are shape parameters, λ is a scale parameter, ξ is a locationparameter, and the function g(·) defines the four distribution families in theJohnson translation system,
g(y) =
⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩
ln(y), for SL (lognormal) family,
ln(
y +√
y2 + 1)
, for SU (unbounded) family,
ln[y/(1 − y)] , for SB (bounded) family,
y, for SN (normal) family.
31
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Johnson c.d.f.
If (5) is an exact normalizing translation of X to a standard normalrandom variable, then the c.d.f. of X is given by
FX(x) = �
[γ + δ · g
(x − ξ
λ
)]for all x ∈ H,
where: �(z) = (2π)−1/2∫ z
−∞ exp( − 1
2w2)
dw is the standard normalc.d.f.; and the space of X is
H =
⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩[ξ, +∞), for SL (lognormal) family,
(−∞, +∞), for SU (unbounded) family,
[ξ, ξ + λ], for SB (bounded) family,
(−∞, +∞), for SN (normal) family.
32
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Johnson p.d.f. is
fX(x) = δ
λ(2π)1/2g′(
x − ξ
λ
)exp
⎧⎨⎩−1
2
[γ + δ · g
(x − ξ
λ
)]2⎫⎬⎭
for all x ∈ H, where
g′(y) =
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩
1/y, for SL (lognormal) family,
1/√
y2 + 1, for SU (unbounded) family,
1/[y/(1 − y)], for SB (bounded) family,
1, for SN (normal) family.
Following are examples illustrating all the distributional shapes in theJohnson system.
33
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Symmetric Bimodal and Nearly Uniform Johnson SB Densities
34
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Nearly J -shaped and Symmetric Unimodal Johnson SB Densities
35
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Positively Skewed and Symmetric Unimodal Johnson SB Densities
36
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Symmetric and Negatively Skewed Johnson SU Densities
37
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Fitting Johnson Distributions to Sample Data
We select an estimation method and the desired translation function g(·)and then obtain estimates of γ , δ, λ, and ξ .
The Johnson system has the flexibility to match—
(a) any feasible combination of values for the mean µX, variance σ 2X,
skewness
αX = E[(
X − µX
)3/σ 3
X
] (often denoted by
√β1),
and kurtosis
βX = E[(
X − µX
)4/σ 4
X
](often denoted by β2);
or
(b) sample estimates of the moments µX, σ 2X, αX, and βX.
38
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• FITTR1 (Swain, Venkatraman, and Wilson 1988) is a software pack-age for fitting Johnson distributions to sample data using the followingestimation methods:
� OLS and DWLS estimation of the c.d.f.;
� minimum L1 and L∞ norm estimation of the c.d.f.;
� moment matching; and
� percentile matching.
39
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• VISIFIT (DeBrota et al. 1989b) is a Windows-based software packagefor fitting Johnson SB distributions to subjective information, possiblycombined with sample data. The user must provide estimates of a, b,and any two of the following characteristics:
� the mode m;
� the mean µX;
� the median x0.5;
� arbitrary quantile(s) xp or xq for p, q ∈ (0, 1);
� the width of the central 95% of the distribution; or
� the standard deviation σX.
Venkatraman, Swain and Wilson (1988), DeBrota et al. (1989b),FITTR1, and VISIFIT are available on the Web via
<www.ise.ncsu.edu/jwilson/more_info>.
40
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Generating Johnson Variates by Inversion
[1] Generate Z ∼ N(0, 1).
[2] Apply to Z the inverse translation
X = ξ + λ · g−1(
Z − γ
δ
), (6)
where for all real z we define the inverse translation function
g−1(z) =
⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩
ez, for SL (lognormal) family,(ez − e−z
)/2, for SU (unbounded) family,
1/(
1 + e−z), for SB (bounded) family,
z, for SN (normal) family.
(7)
41
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Application of Johnson Distributions to Smart Materials Research
• Matthews et al. (2006) and Weiland et al. (2005) use a multiscale mod-eling approach to predict material stiffness of a certain class of smartmaterials called ionic polymers.
• Material stiffness depends on effective length of the polymer chainscomprising the material.
• In a case study of the ionic polymer Nafion, Matthews et al. (2006) de-velop a simulation of polymer-chain conformation on a nanoscopic levelso as to generate a large number of end-to-end chain lengths.
• The chain-length p.d.f. is estimated and used as input to a macroscopic-level mathematical model to predict material stiffness.
42
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Johnson SU C.d.f. Fitted to n = 9,980 Nafion Chain Lengths Using DWLSEstimation
−10 0 10 20 30 40 50 60 70 80 900
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
43
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Johnson SU P.d.f Fitted to n = 9,980 Nafion Chain Lengths Using DWLSEstimation
−10 0 10 20 30 40 50 60 70 80 900
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
44
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Matthews et al. (2006) and Weiland et al. (2005) obtain more accurateand intuitively appealing fits to Nafion chain-length data with Johnsonp.d.f.’s than with other distributions.
– Material stiffness is computed from the second derivative f ′′X(x) of
the fitted p.d.f.
– There is a relatively simple relationship between the Johnsonparameters and material stiffness.
45
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Application of Johnson Distributions to Healthcare
• To model arrival patterns of patients who have scheduled appointmentsat community healthcare clinics in San Diego, Alexopoulos et al. (2006)estimate the distribution of patient tardiness—that is, deviation from thescheduled appointment time.
• Alexopoulos et al. (2006) perform an exhaustive analysis of 18 continu-ous distributions, concluding that the SU distribution provided superiorfits to the available data.
46
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
C. Bézier Distribution Family
Definition of Bézier Curves
• A Bézier curve is often used to approximate a smooth function on abounded interval by forcing the Bézier curve to pass in the vicinity ofselected control points{
pi ≡ (xi, zi)T : i = 0, 1, . . . , n
}in two-dimensional Euclidean space.
47
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• A Bézier curve of degree n with control points {p0, p1, . . . ,pn} is givenparametrically by
P(t) = [Px(t; n, x), Pz(t; n, z)
]T
=n∑
i=0
Bn,i(t) pi for t ∈ [0, 1], (8)
where x ≡ (x0, x1, . . . , xn)T and z ≡ (z0, z1, . . . , zn)
T, and where theblending function,
Bn,i(t) ≡ n!i! (n − i)! t i (1 − t)n−i for t ∈ [0, 1], (9)
is the ith Bernstein polynomial for i = 0, 1, . . . , n.
48
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Bézier Distribution and Density Functions
• If X is a continuous random variable on [a, b] with c.d.f. FX(·) and p.d.f.fX(·), then we can approximate FX(·) arbitrarily closely using a Béziercurve of the form (8) by taking a sufficient number (n + 1) of controlpoints with appropriate coordinates
pi = (xi, zi)T
for the ith control point, where i = 0, . . . , n.
49
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• If X is Bézier, then the c.d.f. of X is given parametrically by
P(t) = [Px(t; n, x), Pz(t; n, z)
]T
= {x(t), FX[x(t)]}T for t ∈ [0, 1], (10)
where
x(t) = Px(t; n, x) =n∑
i=0
Bn,i(t)xi,
FX[x(t)] = Pz(t; n, z) =n∑
i=0
Bn,i(t)zi
⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭ for t ∈ [0, 1]. (11)
For a detailed discussion of Bézier distributions, see
Wagner, M. A. F., and J. R. Wilson. 1996a. Using univariate Bézier distri-butions to model simulation input processes. IIE Transactions 28 (9):699–711. Available online via
<www.ise.ncsu.edu/jwilson/files/wagner96iie.pdf>
50
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• If X is Bézier with c.d.f. FX(·) given by (10), then the p.d.f. fX(x) is
P∗(t) = [P ∗
x (t; n, x), P ∗z (t; n, x, z)
]T
= {x(t), fX[x(t)]}T for t ∈ [0, 1],
where x(t) = P ∗x (t; n, x) = Px(t; n, x) as in (11) and
fX[x(t)] = P ∗z (t; n, x, z)
= Pz(t; n − 1, �z)
Px(t; n − 1, �x)=
∑n−1i=0 Bn−1,i (t)�zi∑n−1i=0 Bn−1,i (t)�xi
,
where
�x ≡ (�x0, . . . , �xn−1)T and �z ≡ (�z0, . . . , �zn−1)
T,
with
�xi ≡ xi+1 − xi and �zi ≡ zi+1 − zi for i = 0, 1, . . . , n − 1.
51
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Generating Bézier Variates by Inversion
[1] Generate a random number U ∼ Uniform[0, 1].[2] Find tU ∈ [0, 1] such that
FX[x(tU )] =n∑
i=0
Bn,i(tU )zi = U. (12)
[3] Deliver the variate
X = x(tU ) =n∑
i=0
Bn,i(tU )xi .
Codes to implement this approach are available on the Web via
<www.ise.ncsu.edu/jwilson/page3>.
52
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Using PRIME to Model Bézier Distributions
• PRIME (Wagner and Wilson 1996a) is a Windows-based system forfitting Bézier distributions to data or subjective information.
• PRIME is available on the previously mentioned Web site.
• Control points appear as indexed black squares that can be manipulatedwith the mouse and keyboard.
– Each control point exerts on the c.d.f. a “magnetic” attraction whosestrength is given by the associated Bernstein polynomial (9).
– Moving a control point causes the displayed c.d.f. to be updated(nearly) instantaneously.
53
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
PRIME Windows Showing the Bézier C.d.f. (Left Panel) with Its ControlPoints and the P.d.f. (Right Panel)
54
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
PRIME includes the following methods for fitting Bézier distributions tosample data:
� OLS estimation of the c.d.f.;
� minimum L1 and L∞ norm estimation of the c.d.f.;
� maximum likelihood estimation (assuming a and b are known);
� moment matching; and
� percentile matching.
For automatic estimation of the number of control points, see
Wagner, M. A. F., and J. R. Wilson. 1996b. Recent developments in inputmodeling with Bézier distributions. In Proceedings of the 1996 WinterSimulation Conference, pp. 1448–1456. Available online as
<www.ise.ncsu.edu/jwilson/files/bezwsc96.pdf>.
55
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Bézier Distribution Fitted to n = 9,980 Nafion Chain Lengths Using OLSEstimation of the C.d.f.
56
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Advantages of the Bézier distribution family:
• It is extremely flexible and can represent a wide diversity of distributionalshapes, including multiple modes and mixed distributions.
• If data are available, then the likelihood ratio test of Wagner and Wilson(1996b) can be used with any of the available estimation methods tofind automatically both the number and location of the control points.
• In the absence of data, PRIME can be used to determine the con-ceptualized distribution based on known quantitative or qualitativeinformation.
• As the number (n + 1) of control points increases, so does the flexibilityin fitting Bézier distributions.
57
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
III. Time-Dependent Arrival Processes
• Many simulation applications require high-fidelity input models ofarrival processes with arrival rates that depend strongly on time.
• Nonhomogeneous Poisson processes (NHPPs) have been usedsuccessfully to model complex time-dependent arrival processes inmany applications.
• An NHPP {N(t) : t ≥ 0} is a counting process such that
� N(t) is the number of arrivals in the time interval (0, t];� λ(t) is the instantaneous arrival rate at time t , and λ(t) satisfies the
Poisson postulates; and
� the (cumulative) mean-value function is given by
µ(t) ≡ E[N(t)] =∫ t
0λ(z) dz for all t ≥ 0. (13)
58
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• We discuss the nonparametric approach of Leemis (1991, 2000, 2004)for modeling and simulation of NHPPs; see
Leemis, L. M. 1991. Nonparametric estimation of the cumulative intensityfunction for a nonhomogeneous Poisson process. Management Science37 (7): 886–900.
Leemis, L. M. 2000. Nonparametric estimation of the cumulative intensityfunction for a nonhomogeneous Poisson process from overlappingrealizations. Management Science 46 (7): 989–998.
Leemis, L. M. 2004. Nonparametric estimation and variate generation for anonhomogeneous Poisson process from event count data. IIETransactions 36:1155–1160.
59
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• The context is a recent application to modeling and simulatingunscheduled patient arrivals to a community healthcare clinic(Alexopoulos et al. 2005)
• Suppose we have a time interval (0, S] over which we observe severalindependent replications (realizations) of a stream of unscheduledpatient arrivals constituting an NHPP with arrival rate λ(t) for t ∈ (0, S].
For example, (0, S] might represent the time period on each weekdayduring which unscheduled patients may walk into a clinic—say,between 9 A.M. and 5 P.M. so that S = 480 minutes.
60
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Suppose k realizations of the arrival stream over (0, S] have beenrecorded so that we have
� ni patient arrivals in the ith realization for i = 1, 2, . . . , k; and
� n =k∑
i=1
ni patient arrivals accumulated over all realizations.
61
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• Let{t(i) : i = 1, . . . , n
}denote the overall set of arrival times for all
unscheduled patients expressed as an offset from the beginning of(0, S] and then sorted in increasing order.
For example, if we observed n = 250 patient arrivals over k = 5 days, eachwith an observation interval of length S = 480 minutes, then
� t(1) = 2.5 minutes means that over all 5 days, the earliest arrivaloccurred 2.5 minutes after the clinic opened on one of those days; and
� t(2) = 4.73 minutes means that the second-earliest arrival occurred4.73 minutes after the clinic opened on one of those days.
� t(n) = 478.5 minutes means that the latest arrival occurred 478.5minutes after the clinic opened on one of those days.
62
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• We estimate the mean-value function µ(t) as follows.
� We take t(0) ≡ 0 and t(n+1) ≡ S.
� For t(i) < t ≤ t(i+1) and i = 0, 1, . . . , n, we take
µ(t) = in
(n + 1)k+{
n[t − t(i)
](n + 1)k
[t(i+1) − t(i)
]}. (14)
• Equation (14) provides a basis for modeling and simulatingunscheduled patient-arrival streams when the arrival rate exhibits astrong dependence on time.
63
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
�
�
(0,0) t(1) t(2) t(3) t(n) t(n+1) ≡ S
n(n+1)k
2n(n+1)k
3n(n+1)k
n2
(n+1)k
nk
����������
��
��������
��������...
��������
...
µ(t)
Nonparametric Estimator of Mean Value Function
64
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Goodness-of-fit Testing for the Fitted Mean-Value Function
• In addition to the realizations of the target arrival process that wereused to compute the estimated mean-value function µ(t), suppose weobserve one additional realization{
A′i : i = 1, 2, . . . , n′}
independently of the previously observed realizations, with the ithpatient arriving at time A′
i for i = 1, . . . , n′.
• If the target arrival stream is an NHPP with mean-value function µ(t)
for t ∈ (0, S], then the transformed arrival times{B ′
i = µ(A′
i
) : i = 1, 2, . . . , n′}constitute a homogeneous Poisson process with an arrival rate of 1.
65
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
• If the target arrival stream is an NHPP with mean-value function µ(t)
for t ∈ (0, S], then the corresponding transformed interarrival times{X′
i = B ′i − B ′
i−1 : i = 1, 2, . . . , n′}(with B ′
0 ≡ 0) constitute a random sample from an exponentialdistribution with a mean of 1.
• To test the adequacy of the fitted mean-value function µ(t) as anapproximation to µ(t), apply the Kolmogorov-Smirnov test to the dataset {
X′′i = µ
(A′
i
)− µ(A′
i−1
) : i = 1, 2, . . . n′}(with A′
0 ≡ 0), where the hypothesized c.d.f. in the goodness-of-fit test is
FX′′i(x) = 1 − e−x for all x ≥ 0.
66
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Generating Realizations of the Fitted NHPP
[1] Set i ← 1 and N ← 0.[2] Generate Ui ∼ Uniform(0, 1).[3] Set Bi ← − ln(1 − Ui).[4] While Bi < n/k do
Begin
Set m ←⌊
(n + 1)kBi
n
⌋;
Set Ai ← t(m) + {t(m+1) − t(m)
}{ (n + 1)kBi
n− m
};
Set N ← N + 1; Set i ← i + 1;Generate Ui ∼ Uniform(0, 1);Set Bi ← Bi−1 − ln(1 − Ui).
End
NHPP Simulation Procedure of Leemis (1991)
67
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Advantages of Leemis’s Nonparametric Approach toModeling and Simulation of NHPPs
• It does not require the assumption of any particular form for arrival rateλ(t) as a function of t .
• It provides a strongly consistent estimator of mean-value function—thatis,
limk→∞ µ(t) = µ(t) for all t ∈ (0, S] with probability 1.
• The simulation algorithm given above, which is based on inversion ofµ(t) so that
Ai = µ−1(Bi) for i = 1, . . . , N,
is also asymptotically valid as k → ∞.
68
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Application to Organ Transplantation Policy Analysis
• The United Network for Organ Sharing (UNOS) applied a simplifiedvariant of this approach in the development and use of the UNOS LiverAllocation Model (ULAM) for analyzing the cadaveric liver-allocationsystem in the U.S. (see Harper et al. 2000).
• ULAM incorporated models of
(a) the streams of liver-transplant patients arriving at 115 transplantcenters, and
(b) the streams of donated organs arriving at 61 organ procurementorganizations in the United States.
Virtually all these arrival streams exhibited long-term trends as well asstrong dependencies on the time of day, the day of the week, and thegeographic location of the arrival stream.
69
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Handling Arrival Processes Having Trends and Cyclic Effects
• Kuhl, Sumant, and Wilson (2006) develop a “semiparametric” methodfor modeling and simulating arrival processes that may exhibit along-term trend or nested periodic phenomena (such as daily andweekly cycles), where the latter effects might not necessarily possessthe symmetry of sinusoidal oscillations.
• See
Kuhl, M. E., S. G. Sumant, and J. R. Wilson. 2006. An automatedmultiresolution procedure for modeling complex arrival processes.INFORMS Journal on Computing 18 (1): 3–18.Available online via
<www.ise.ncsu.edu/jwilson/files/kuhl06joc.pdf>
70
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Fitted Rate Function over 100 Replications of a Test Processwith One Cyclic Rate Component and Long-term Trend
0
100
200
300
400
500
600
700
0 1 2 3 4 5 6 7 8 9 10Time t
Arr
ival
Rat
e
71
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Fitted Mean-Value Function over 100 Replications of a Test Processwith One Cyclic Rate Component and Long-term Trend
0
600
1200
0 1 2 3 4 5 6 7 8 9 10Time t
Cum
ulat
ive
Mea
n A
rriv
als
72
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
Web-based Input Modeling Software
<www.rit.edu/˜kuhl1/simulation>
73
Introduction to Modeling and Generating Probabilistic Input Processes for Simulation
IV. Conclusions and Recommendations
• The common thread running through this tutorial is the focus on robustinput models that are
� computationally tractable and
� sufficiently flexible to represent adequately many of the probabilisticphenomena that arise in many applications of discrete-eventstochastic simulation.
• Notably absent is a discussion of Bayesian input-modelingtechniques—a topic that will receive increasing attention in the future.
• Additional material on input modeling is available via
<www.ise.ncsu.edu/jwilson/more_info>.
74
Recommended