Dominated rejection algorithms for generating random variates

Preview:

Citation preview

Focus Article

Dominated rejection algorithmsfor generating random variatesTimothy Hall∗

This focus article presents a practical modified version of the von Neumanndominated rejection method for generating univariate random variates using adistribution density function, a uniform variate over a finite interval, and anindependent uniform variate over the unit interval. An example generated variatefrom the normal distribution family is included for demonstration purposes. © 2012Wiley Periodicals, Inc.

How to cite this article:WIREs Comput Stat 2012. doi: 10.1002/wics.1230

Keywords: dominated rejection algorithm; generated variate data; embeddedsystems code

INTRODUCTION

The purpose of this focus article is to documentthe theory of a dominated rejection algorithm

(DRA) that may be used to generate arbitrary randomvariates according to a density function alone. Thesemethods were inspired by a version first proposedby von Neumann1 (who referred to the algorithmas ‘rejection sampling’), as related by Knuth inRef 2. Additional conditions on the density functionf described in the following theorem are imposed inPolicies section that are needed to implement the DRAin a low-level programming language, such as MMIX,for use in embedded systems.

MMIX is the successor assembly level program-ming language to MIX, both of which were inventedby Knuth in support of his The Art of Computer Pro-gramming (TAOCP) series and related publications.MMIX programming is documented in ProfessorKnuth’s definitive reference work,3 and in Fascicle 1of the updated TAOCP Volume 1. It provides allthe functionality necessary to implement the DRAin a low-level programming environment for use inembedded systems.

MAIN THEOREMvon Neumann proposed an equivalent version of thefollowing theorem as a means to access all random

∗Correspondence to: info@pqic.com

PQI Consulting, P. O. Box 425616, Cambridge, MA, USA

variate distributions (both discrete and continuous)through the strategic use of a uniform distribution. Aslong as discrete or continuous uniform variate valuesmay be efficiently generated, then those values may beused to generate any distribution variate values (undersuitable regularity conditions—see Policies section).For a proof of the theorem, see Refs 2 and 4.

Theorem 1 Let f : R → R+ and g : R → R

+ be tworandom variable density functions such that

0 < f (x) ≤ kg (x)

for some constant k > 0, for all x ∈ R, and let Ube a uniformly distributed random variate on [0, 1].Generate random variate X = x0 according to thedistribution given by g, and generate random variateU = u0, independently of X. If X = x0 is acceptedwhen u0 <

f (x0)

kg(x0)and rejected otherwise, then the

random variate generated by the accepted X valueshas a distribution given by f .

A DOMINATED REJECTIONALGORITHM

A DRA applies Theorem 1 to the special case whereg is also a uniform random variable on a sufficientlylarge connected, compact subset V of the support off . Those values of X that are necessarily excluded bybeing outside of V are considered so unlikely to occurin the observed values under f that their exclusion

© 2012 Wiley Per iodica ls, Inc.

Focus Article wires.wiley.com/compstats

does not compromise the application in which thealgorithm is implemented. The values of X under thisform of g are generated independently of U but alsoin the same manner as U.

The compact subset V should include thosevalues of f that are most likely to occur, which are(usually) the values most likely of interest in anyimplementation.

BackgroundLet T be a random variable defined on R, with strictlypositive, mass or density function f : R → R

+, andsuppose there is an interval of interest V = [−a, b

],

where a + b > 0. In the application of Theorem 1, letg : R → R

+ be the (discrete or continuous) uniformdistribution on V, so that

g (x) = 1a + b

, − a ≤ x ≤ b.

Furthermore, let

M = max{f (x) : x ∈ V

}> 0

which exists as a real number since f is strictly positiveand either discrete or continuous on R. Then

0 < f (x) ≤ M = a + ba + b

M = (a + b

)Mg (x) = kg (x)

where k = (a + b

)M > 0.

The AlgorithmBy Theorem 1, the following steps generates a variatewith distribution given by f , which is therefore arandom variate for T.

1. Generate a uniform variate value x0 on V.

2. Generate a uniform variate value u0 on [0, 1]independently of Step 1.

3. If u0M < f (x0) then accept x0 as a variate valuefor T; otherwise, reject the value of x0.

4. Repeat from Step 1 until complete.

Step 3 follows from the fact that, in thisapplication,

f (x0)

kg (x0)= f (x0)(

a + b)

M(

1a+b

) = 1M

f (x0) ≤ 1

and M > 0.

Since a uniform variate value t0 on V may begenerated by a uniform variate value w0 on [0, 1] bythe transformation

t0 = (a + b

)w0 − a

then Steps 1 and 2 may be accomplished using the sameuniform variate generator with different initializationvalues to ensure independence.

Note that no value outside of the interval ofinterest may be generated nor accepted in this manner,since 0 ≤ w0 ≤ 1 means −a ≤ t0 ≤ b.

ITERATION COUNTS

Given the uniform variate value x0 in the DRA, theprobability that the independently chosen uniformvariate value u0 is less than f (x0)

M is itself equal to f (x0)

M .Since the probability that the value x0 is chosen is

1a+b , then the probability that the joint variate (x0, u0)

generates an accepted value for x0 is f (x0)

(a+b)M. The joint

variate may be viewed as a binomial process whose‘‘probability of success’’ is f (x0)

(a+b)M.

The expected value of choosing x0 is

∫ b

−a

1a + b

x dx = 12

(a + b

) (b2 − a2

)= 1

2

(b − a

)

so that the ‘expected value’ of the ‘probability ofsuccess’ is

f(1

2

(b − a

))(a + b

)M

which, in turn, means the expected number of acceptedvalues for x0 in N passes through the algorithm is

λ = f(1

2

(b − a

))(a + b

)M

N.

The choices of a and b that maximizef(1

2

(b − a

))while minimizing a + b drives the

maximum value of λ for a given N.

POLICIES

The DRA may be applied to any random variable Xon R under the following analytical conditions (whichfacilitates the use of low-level programming languageimplementation code).

1. The distribution of X must have a well-defined,finite, piecewise discrete or continuous densityfunction f defined on its support.

© 2012 Wiley Per iodica ls, Inc.

WIREs Computational Statistics Dominated rejection algorithms

−6.0

0−5

.72

−5.4

4−5

.16

−4.8

8−4

.60

−4.3

2−4

.04

−3.7

6−3

.48

−3.2

0−2

.92

−2.6

4−2

.36

−2.0

8−1

.80

−1.5

2−1

.24

−0.9

6−0

.68

−0.4

0−0

.12

0.16

0.44

0.72

1.00

1.28

1.56

1.84

2.12

2.40

2.68

2.96

3.24

3.52

3.80

4.08

4.36

4.64

4.92

5.20

5.48

5.76

Value

1400

1200

1000

800

600

400

200

0

Fre

quen

cy

Standard normal variate

FIGURE 1 | Standard normal generated variate values.

2. The function f must have a finite second centralmoment (called the variance σ 2

X, which means italso has a finite first central moment, called themean μX).

3. All intervals of interest are based only on μXand σX.

4. The analytical methods used to generate uniformvariate values conform to the predefinedpolicies, specifications, and requirements thatare independently established for the DRA.

5. The required number of accepted values X = x0is determined and fixed before the algorithm isapplied.

6. Exceptions to these policies are allowed asrequired by the particular circumstances of theimplementation.

EXAMPLE: THE STANDARD NORMALRANDOM VARIATEConsistent with the prototype MMIX implementationcode, consider the standard normal random variableX with density function

f (x) = 1√2π

e− 12 x2

.

We have

μX = 0 < ∞ and σX = 1 < ∞

and

ddx

f (x) = 1√2π

(−x) e− 12 x2 = 0

means that x = 0 is the only critical value. Then

d2

dx2 f (x)

∣∣∣∣x=0

=(

1√2π

x2e− 12 x2 − 1√

2πe− 1

2 x2)∣∣∣∣

x=0

= − 1√2π

< 0

means x = 0 is a global maximum for f on R. Thismeans M = f (0) = 1√

2π.

Therefore,

f(1

2

(b − a

))(a + b

) = 1√2π

(a + b

)e− 18 (b−a)

2

so that

∂a

f(1

2

(b − a

))(a + b

)

=

⎛⎜⎝

14

(b − a

) 1√2π(a+b)

e− 18 (b−a)

2

− 1√2π(a+b)

2 e− 18 (b−a)

2

⎞⎟⎠ = 0

© 2012 Wiley Per iodica ls, Inc.

Focus Article wires.wiley.com/compstats

∂b

f(1

2

(b − a

))(a + b

)

=

⎛⎜⎝ −1

4

(b − a

) 1√2π(a+b)

e− 18 (b−a)

2

− 1√2π(a+b)

2 e− 18 (b−a)

2

⎞⎟⎠ = 0

which means

12

(b − a

) 1√2π

(a + b

)e− 18 (b−a)

2 = 0

or

a = b.

Hence, all intervals of interest V should be symmetricabout the mean 0, i.e., of the form V = [−b, b

], and

this gives

λ = f(1

2

(b − a

))(a + b

)M

N =(

f (0)

M

)N2b

= N2b

.

The smaller b > 0 becomes, the larger λ becomes, andthe larger b becomes, the smaller λ becomes. Thistrade off then determines the value of b (and a). If K

accepted values are needed, the value of b should bechosen so that the interval

[−b, b]

minimally coversall values of interest and that 2bK is not unacceptablyhigh.

Figure 1 depicts a histogram of K = 300, 000for b = 6. In this case, approximately N = 2bK =2 (6) (300, 000) = 3, 600, 000 passes through thealgorithm were required. Even though this may appearto be an excessive number of iterations, such calcu-lations may be affected in a low-level programminglanguage environment and consume only a few pro-cessing cycles per iteration.

CONCLUSION

This focus article presents the theory, calculationmethods, and implementation details for a DRA thatprovides for the calculation of an arbitrary discrete orcontinuous distribution random variate through theuse of a discrete or continuous uniform random vari-ate. Iteration count estimates were provided, and poli-cies were proposed for implementing the algorithm ina low-level programming language environment foruse in embedded systems. See Ref 4 for further detailson the algorithm and its implementation.

REFERENCES1. von Neumann J. Various techniques used in connection

with random digits. Applied Mathematics Series: MonteCarlo Methods, vol. 12. Washington, DC: NationalBureau of Standards; 1951:36–38.

2. Knuth DE. The Art of Computer Programming, Seminu-merical Algorithms, vol. 2, 3rd ed. Reading, MA:Addison-Wesley; 1998.

3. Knuth DE. MMIXware: A RISC Computer for theThird Millennium. Heidelberg, Germany: Springer-Verlag; 1999.

4. Hall T. A dominated rejection algorithm for generatingrandom variates. Proceedings of the 2011 Joint StatisticalMeetings, Miami Beach, FL, 2011.

© 2012 Wiley Per iodica ls, Inc.

Recommended