39
(Pseudo) Random Number Generation Properties of pseudo-random numbers Continuous numbers between 0 and 1 Probability of selecting a number in interval (a,b) ~ (b-a) i.e. Uniformly distributed Numbers are statistically independent Can’t really generate random numbers ∞ information – finite algorithm or table Example: XL spreadsheet function =RAND() Also, want fast and repeatable...

Teknik Simulasi

Embed Size (px)

Citation preview

Page 1: Teknik Simulasi

(Pseudo) Random Number

Generation

Properties of pseudo-random numbers

• Continuous numbers between 0 and 1

• Probability of selecting a number in interval

(a,b) ~ (b-a) – i.e. Uniformly distributed

• Numbers are statistically independent

• Can’t really generate random numbers

∞ information – finite algorithm or table

Example: XL spreadsheet function =RAND()

• Also, want fast and repeatable...

Page 2: Teknik Simulasi

Random Number Generation

How to generate random numbers

– Table look-up

– Computer generation: these values cannot be

truly random and a computer cannot express a

number to an infinite number of decimal places

Pseudorandom numbers

Page 3: Teknik Simulasi

Desirable Properties of Random

Number Generators• Fast

• Should not require much memory

• Long cycle or period

• Numbers should come close to:

– Uniformity (or known distribution)

– Independence

Page 4: Teknik Simulasi

Historical Generator

Midsquare method:

1. Start with an initial seed (e.g. a 4-digit integer).

2. Square the number.

3. Take the middle 4 digits.

4. This value becomes the new seed. Divide the number by 10,000. This becomes the random number. Go to 2.

Page 5: Teknik Simulasi

Midsquare Method, example

x0 = 5497

x1: 54972 = 30217009 x1 = 2170, R1 = 0.2170

x2: 21702 = 04708900 x2 = 7089, R2 = 0.7089

x3: 70892 = 50253921 x3 = 2539, R3 = 0.2539

Drawback: Hard to state conditions for picking initial seed that will generate a “good” sequence.

Page 6: Teknik Simulasi

Midsquare Generator, examples

“Bad” sequences:

• x0 = 5197x1: 51972 = 27008809 x1 = 0088, R1 = 0.0088x2: 00882 = 00007744 x2 = 0077, R2 = 0.0077x3: 00772 = 00005929 x3 = 0059, R3 = 0.0059

• xi = 6500xi+1: 65002=42250000 xi+1=2500, Ri+1= 0.0088xi+2: 25002=06250000 xi+2=2500, Ri+1= 0.0088

Page 7: Teknik Simulasi

Linear Congruential Generator

(LCG) GeneratorStart with random seed

Z0 < m = largest possible integer on machine

Recursively generate integers between 0 and M

Zi = (a Zi-1 + c) mod m

Use U = Z/m for pseudo-random number get

(avoid 0 and 1)

When c = 0 Called Multiplicative Congruential Generator

When c > 0 Mixed LCG

Page 8: Teknik Simulasi

Linear Congruential Generator (LCG) (Lehmer 1951)

Let Zi be the ith number (integer) in the sequence

Zi = (aZi-1+c)mod(m) Zi{0,1,2,…,m-1}

where Z0 = seed

a = multiplier

c = increment

m = modulus

Define Ui = Zi /m (to obtain U(0,1) value)

Page 9: Teknik Simulasi

LCG, example

16-bit machine

a = 1217 c = 0 Z0 = 23 m = 215-1 = 32767

Z1 = (1217*23) mod 32767 = 27991

U1 = 27991/32767 = 0.85424

Z2 = (1217*27991) mod 32767 = 20134

U2 = 20134/32767 = 0.61446

Page 10: Teknik Simulasi

An LCG can be expressed as a function of the seed Z0

THEOREM:

Zi = [aiZ0+c(ai-1)/(a-1)] mod(m)

Proof: By induction on i

i=0 Z0 = [a0Z0+c(a0-1)/(a-1)] mod(m)

Assume for i. Show that expression holds for i+1

Zi+1 = [aZi+c] mod(m)

= [a {[aiZ0+c(ai-1)/(a-1)] mod(m)} +c] mod(m)

= [ai+1Z0+ac(ai-1)/(a-1) +c] mod(m)

= [ai+1Z0+c(ai+1-1)/(a-1) ] mod(m)

Page 11: Teknik Simulasi

Examples:

Zi = (69069Zi-1+1) mod 232 Ui = Zi /232

Zi = (65539Zi-1+76) mod 231 Ui = Zi/231

Zi = (630360016Zi-1) mod (231-1) Ui = Zi/231

Zi = 1313Zi-1 mod 259 Ui = Zi/259

What makes one LCG better than another?

Page 12: Teknik Simulasi

A full period (full cycle) LCG generates all m values

before it cycles.

Consider Zi = (3Zi-1+2) mod(9) with Z0 =7

Then Z1 = 5 Z2 = 8 Z3 = 8 Zj = 8 j = 3,4,5,6,…

On the other hand Zi = (4Zi-1+2) mod(9) has full period.

Why?

Page 13: Teknik Simulasi

Random Number Generation

Mixed congruential generator is full period if

1. m = 2B (B is often # bits in word) fast

2. c and m relatively prime (g.c.d. = 1)

3. If 4 divides m, then 4 divides a – 1

(e.g., a = 1, 5, 9, 13,…)

Page 14: Teknik Simulasi

The period of an LCG is m (full period or full cycle) if and only if

— If q is a prime that divides m, then q divides a-1

— The only positive integer that divides both m and c is 1

— If 4 divides m, then 4 divides a-1.

Examples

Zi+1 = (16807Zi+3) mod (451605),

where 16807 =75, 16806 =(2)(3)(2801), 451605 =(3)(5)(7)(11)(17)(23)

This LCG does not satisfy the first two conditions.

Zi+1 = (16807Zi+5) mod (635493681)

where 16807 =75, 16806 = (2)(3)(2801), 635493681 = (34)(28012)

This LCG satisfies all three conditions.

Page 15: Teknik Simulasi

- m = 2B where B = # bits in the machine is often a good choice

to maximize the period.

- If c = 0, we have a power residue or multiplicative generator.

Note that Zn = (aZn-1) mod(m) Zn = (anZ0) mod(m).

If m = 2B, where B = # bits in the machine, the longest period is m/4

(best one can do) if and only if

— Z0 is odd

— a = 8k+ 3, kZ+ (5,11,13,19,21,27,…)

Page 16: Teknik Simulasi

Random Number Generation

Other kinds of generators

• Quadratic Congruential Generator

– Snew = (a1 Sold2 + a2 Sold

2 + b) mod L

• Combination of Generators

– Shuffling – L’Ecuyer – Wichman/Hill

• Tausworthe Generator

– Generates sequence of random bits

Page 17: Teknik Simulasi

Feedback Shift Generators

• Tausworthe, Math of Computing 1965

• If {ak} is a sequence of binary digits (0 or 1)

defined by

ak = (c1ak-1 + c2ak-2 + … + cpak-p)mod 2

and the c’s are relatively prime, then {ak}

has period 2p-1

Page 18: Teknik Simulasi

IBM - Randu

If c = 0 power residue generator

(multiplicative generator)

un = anu0 mod m

un = a un-1 mod m (homework)

Page 19: Teknik Simulasi

NOTES

— Never “invent” your own LCG. It will probably not be “good.”

— All simulation languages and many software packages have

their own PRN generator. Most use some variation of a linear

congruential generator.

— Power residue generators are the most common.

Page 20: Teknik Simulasi

Tests of RNG, cont’d

• Theoretical tests

– Prove sample moments over entire cycle are

correct

– Lattice structure of LCGs

• “random numbers fall mainly in the planes”

(Marsaglia)

• Spacing hyperplanes: the smaller, the better

Page 21: Teknik Simulasi

Tests of Random Number

Generators• Empirical tests

– Uniformity

• Compute sample moments

• Goodness of fit

– Independence

• Gap Test

• Runs Test

• Poker Test

• Spectral Test

• Autocorrelation Test

Page 22: Teknik Simulasi

Testing Random Number Generators

Desirable Properties:

• Mean and Variance

Theorem: E 1/2 and V 1/12 as m+.

Proof:

For a full period LCG, every integer value from 0 to m-1 is

represented. Therefore

E = (0+1+…+(m-1))/m2 = ((m-1)(m)/2)/m2 = (1/2)-(1/2m)

V = ((02+12+22+…+(m-1)2)/m3) - E2

= [(m)(m-1)(2m-1)/6]/m3 - [(1/2) - (1/2m)]2

= [(1/12) - (1/12m2)]

Page 23: Teknik Simulasi

• Uniformity

2 Goodness of Fit Test

— Divide n observations into k (equal) intervals

— Do a frequency count fi, i=1,2,…,k

— Compute X2 = i (fi -n/k)2 / (n/k)

= i (fi -npi)2 / npi,

where pi = 1/k, i=1,2,…,k.

Page 24: Teknik Simulasi

• • •

f1 f2 fk-1 fk

01

k

2

k

k 2

k

k 1

k

e1 e2 ek-1 ek

ei = expected number of observations in interval i

= n pi = n / k, i = 1, 2, …, k

Data Classification

1

Page 25: Teknik Simulasi

NOTE

— (fi -npi)/(npi)1/2 is the N(0,1) approximation of

a multinomial distribution for pi small, where

E[fi] = npi and Var [fi] = npi (1-pi)).

— For n large, X2 is distributed 2 with k-1 degrees of freedom

—Reject randomness assumption X2 > 2

1k

NOTE: if X2 is too close to zero, it may be because the

numbers have been “fudged.”

BE WARY OF PRN WHICH LOOK TOO RANDOM

Page 26: Teknik Simulasi

Do Not

Reject HOReject HO

k1,2

2 Goodness of Fit Test

- Repeat test m times with independent samples of size n

- If H0 is true, test will reject H0 m times (on average)

Page 27: Teknik Simulasi

Trouble Spots

— Choosing the intervals evenly

— Choosing the intervals such that you would expect

each class to contain at least 5 or 10 observations

— pi should (ideally) be small (<.05)

Page 28: Teknik Simulasi

Example

n = 1000

[0, .1) fi = 87 [.1, .2) fi = 93 [.2, .3) fi = 113

[.3, .4) fi = 106 [.4, .5) fi = 108 [.5, .6) fi = 99

[.6, .7) fi = 91 [.7, .8) fi = 95 [.8, .9) fi = 103

[.9, 1.] fi = 105

X2 = 628/100 = 6.28 Do not reject H0 : U(0,1).

9,.052

16.919

Page 29: Teknik Simulasi

NOTE

— The 2 goodness of fit test is also used to fit distributions to

data, where

X2 = i (fi -ei)2 / ei

ei = expected number of observations in interval i.

Page 30: Teknik Simulasi

Kolmogorov-Smirnov Goodness-of-fit Test

— Order n U[0,1] variates {x[i]}

— Construct an empirical CDF for the n variates {x[i]}

(i.e., F(x[i]) = i/n i = 1,2,…,n)

— Construct a hypothesized CDF for n uniform variates

(i.e., = x, 0x1)

— Compute D = max {D+, D-}, where

D+ = Max1<i<n [(i/n)-

D- = Max1<i<n [ -((i-1)/n)].

— Check tables

• Reject if D is too large, with a risk , which means that

we reject (uniformity) falsely with probability .

(x)F~

˜ F (x[i])]

)(xF~

[i]

Page 31: Teknik Simulasi

1.00

.75

.50

.25

0 .1 .2 .3 .9 1.0

D+ = max {.15, .30, .45, .10}=.45

D- = max {.10, -.05, -.20, .15}=.15

D+

D-

Page 32: Teknik Simulasi

Examples

— If {Ui} = {.1, .2, .3, .9}, then D = .45.

— If {Ui} = {.2, .6, .8, .9}, then D = .35.

— If {Ui} = {.25, .5, .75, 1.}, then D = .25.

NOTE: The minimum value that D can take on is 1/2n.

(How?)

Page 33: Teknik Simulasi

Independence

— Sign Test

* Test Statistic: S = runs of numbers above or below median)

* For large N, S is distributed N( = 1+(N/2), 2 = N/2)

Example

N = 15, S = 7, distributed N( = 8.5, 2 = 15/2)

Maximum value for S: N (negative dependency)

Minimum value for S: 1 (positive dependency)

.87 .15 .23 .45 .69 .32 .30 .19 .24 .18 .65 .82 .93 .22 .81

+ - - - + - - - - - + + + - +

Page 34: Teknik Simulasi

Normal Curve Rejection Regions

REJECT

(+ve)

REJECT

(-ve)Do

Not

REJECT

Reject H0 in favor of HA if

Z = (S - (1+(N/2))) / (N/2)1/2 Z/2 or Z Z/2

Z/2-Z/2

H0 : Independence

HA : Dependence

Page 35: Teknik Simulasi

— Runs Up and Down Test

(runs of increasing and decreasing numbers)

• Assign + if xi <xi+1, assign - if xi>xi+1

• Test Statistic: S = number of runs up AND down

(sequence of + and -)

• E(S) = (2N-1)/3, V(S) = (16N-29)/90

• Use Normal approximation for N>30.

Example:

N = 15, S = 8, distributed N(µ = 29/3, 2 = 211/90)

Maximum value for S: N-1 (negative dependency)

Minimum value for S: 1 ?

.87 .15 .23 .45 .69 .32 .30 .19 .24 .18 .65 .82 .93 .22 .81

- + + + - - - + - + + + - +

Page 36: Teknik Simulasi

Normal Curve Rejection Regions

REJECT (-ve)REJECT

Do Not

REJECT

H0 : Independence

HA : Dependence

Z/2-Z/2

Reject H0 in favor of HA if

Z = (S - (2N-1)/3) / (16N-29/90)1/2 Z/2 or Z Z/2

Page 37: Teknik Simulasi

Test of Cycling

Floyd’s Test for Cycling

• Assume ui = G(ui-1)

• x0 = y0 = seeds

• xi = G(xi-1) yi = G(G(yi-1)), i.e. skip every

other one so y will go twice as fast as x.

Then check to see if there is some value of

n for which xn = yn.

• If xn = yn, cycling occurred.

Page 38: Teknik Simulasi

Marsaglia’s Theorem

All N-tuples generated by a congruential generator

will fall in fewer than (N!m)1/N hyperplanes.

(Proc. Nat. Acad. Sci. 61, 1968 pp.25-28)

e.g. all 10-tuples fall in fewer than 13 9-dimensional

planes for m = 216. Randu in ONLY 15

PLANES in 3D cube.

(Solution: Make m bigger – limited by computer

word size.)

Page 39: Teknik Simulasi

Plot of RNDi+1 vs RNDi using LCG in SIGMA