CDM (Pseudo) Randomnesssutner/CDM/pdf/40-randomness-6up.pdf · A Million Random Digits With 100,000 Normal Deviates \Normal deviates" simply means that the distribution of the random

$Page 1: CDM (Pseudo) Randomnesssutner/CDM/pdf/40-randomness-6up.pdf · A Million Random Digits With 100,000 Normal Deviates \Normal deviates" simply means that the distribution of the random$
CDM

(Pseudo) Randomness

Klaus Sutner

Carnegie Mellon University

40-randomness 2017/12/15 23:17

1 Randomness and Physics

Formalizing Randomness

Generating PRNs

Where Are We? 3

We have seen that Kolmogorov-Chaitin complexity can be used to give adefinition on randomness for finite bit strings: in essence, a string is random ifit cannot be compressed.

In the actual theory of randomness one is more interested in infinite bit strings,elements of 2ω. Actually, we have already seen one example of a randominfinite string: Chaitin’s Ω.

Here is some more background on randomness.

Random Numbers 4

Random numbers (or random bits) are a crucial ingredient in many algorithms.

For example, all industrial strength primality testing algorithms rely on theavailability of random bits. Modern cryptography is unimaginable withoutrandomness.

Since computers are deterministic devices (more or less), it is actually not allthat easy to produce random bits using a computer: whatever program we runwill produce the same bits if we run it again.

To make matters worse, it is rather difficult to even say exactly what is meantby a sequence of random bits. Of course, intuitively we all know whatrandomness means, right?

Predictions 5

Randomness is closely associated with the inability to predict what is going tohappen. You roll a die, all 6 outcomes are possible.

And yet, rolling a die 6000 times, one would expect the number of 6’s to besomewhere around 1000, say, in the interval [950, 1050]. Some could evencalculate that if we expand the interval to [913, 1087] then the likelihood ofhitting it is 0.997.

Flipping Coins 6

Another often used random bit generator: flipping coins.

How Random Is It? 7

Is the randomness in a coin-toss real or is itactually confined to just the initial conditions?

Persi Diaconis, a Stanford mathematician andhighly accomplished professional magician,supposedly can consistently produce tenconsecutive heads flipping a coin – by carefullycontrolling the initial conditions.

Lava Lamps 8

Krypton-85 9

Radioactivity is another great source of randomness – except that no one likesto keep a lump of radioactive material and a Geiger-Muller counter on theirdesk. Solution: keep the radioactive stuff someplace else and get the randombits over the web.

True random bits from www.fourmilab.ch.

Huge Difference 10

The last system (and also the lava lamps, see below) is very different from theothers: if our current understanding of physics is halfway correct, there is noway to predict certain events in quantum physics, like radioactive decay. It isfundamentally impossible (even if we could establish initial conditions correctly,which we cannot thanks to Herr Heisenberg).

The other, purely mechanical systems such as dice and coins, we encounterdeterministic chaos: given sufficiently precise descriptions of the initialconditions, and sufficient compute power, one could in principle compute theoutcomes (if we think of them as classical systems).

In principle only, not in practice.

Lorenz Attractor 11

Here is a famous example discovered by Lorenz in the 1963, in an attempt tostudy a hugely simplified model of heat convection in the atmosphere.

x′ = σ(y − x)

y′ = rx− y − xzz′ = xy − bz

These are not spatial coordinates, x stands for the amplitude of convectivemotion, y for temperature difference between rising and falling air currents, andz between temperature in the model and a simple linear approximation.

For certain values of the parameters we get the following behavior.

Pre-History 13

In the olden days, the RAND Corporation used a kind of electronic roulettewheel to generate a million random digits (rate: one per second).

In 1955 the data were published under the title:

A Million Random Digits With 100,000 Normal Deviates

“Normal deviates” simply means that the distribution of the random numbersis bell-shaped rather than uniform. But the New York Public Library shelvedthe book in the psychology section.

The RAND guys were surprised to find that their original sequence had severaldefects and required quite a bit of post-processing before it could pass musteras a random sequence. This took years to do.

Available at http://www.rand.org/publications/classics/randomdigits.

Fiat Lux 14

Incidentally, Noll and Cooper at Silicon Graphics discovered one day that thepretty lava lamps were completely irrelevant: they could get even betterrandom bits with the lens cap on (there is enough noise in the circuits to getgood randomness).

Another way to use light, very much unlike the original lava lamp system, is toexploit an elementary quantum optical process: a photon hitting asemi-transparent mirror either passes or is reflected.

The Quantis systems was developed at the University of Geneva, the firstpractical model was released in 1998.

Note that quantum physics is the only part of physics that claims that theoutcome of certain processes is fundamentally random (which is why Einsteinwas never very fond of quantum physics).

http://www.idquantique.com/

The Magic Device 15

http://www.idquantique.com/random-number-generation/

quantis-random-number-generator/

Quantis TRNG 16

Features

True quantum randomness

High bit rate of 4Mbits/sec (up to 16Mbits/sec for PCI card)

Low-cost device (1000+ Euros)

Compact and reliable

USB or PCI, drivers for Windows and Linux

Applications

Numerical Simulations

Statistical Research

Lotteries and gambling

Cryptography

Randomness and Physics

2 Formalizing Randomness

Generating PRNs

Hilbert’s 6th Problem 18

Mathematical Treatment of the Axioms of Physics.

The investigations on the foundations of geometry suggest theproblem: To treat in the same manner, by means of axioms,those physical sciences in which already today mathematics playsan important part; in the first rank are the theory of probabilitiesand mechanics.

Unpredictability 19

So far, all the examples we have seen are based on the notion of unpredictabilityin physics: the behavior of some physical systems is so complicated that theonly way to determine the state at time t is “run” the system till time t.

Now suppose you use random bits in some algorithm and you want to proveyour algorithm to be correct. Without a formal, mathematical definition youcannot even start with the proof.

And try to convince a proof checker that it’s correct when the first line reads“flip a coin 10000 times.”

Defining Randomness 20

All we have so far are various references to common sense and physics. Physicswould be fine if we had a coherent, precise theory of the field. We don’t.Hilbert’s problem number 6 is still open.

Of course, there are excellent physical theories available, in particular relativitytheory and quantum theory, but things get dicey when one tries to prove strictmathematical results. In essence, we can only argue relative to someaxiomatization.

Still, we can home in on a few central properties of randomness – propertiesthat we feel intuitively to be associated with randomness rather than being ableto derive them from first principles. But one has to be careful, intuition is notalways reliable.

Obstructions to Randomness 21

It is somewhat easier to consider infinite binary sequences α ∈ 2ω rather thanfinite ones. Infinity is often an excellent approximation for finiteness.

Let’s turn around for the moment and look at properties that would disqualifya sequence from being random in any intuitive sense of the word. Here are twoobvious potential problems.

Bias (or skew): the probability of a 0 is not 1/2.

Correlation: the i+ 1st bit is not independent from the i th bit.

Fortunately, we don’t need to achieve perfection in either category: there arealgorithms that can turn a slightly biased and/or correlated sequence into anunbiased and uncorrelated one.

Here is a simple though not entirely satisfactory method to eliminate bias dueto von Neumann. Dealing with correlated bits is harder, we won’t get involved.

Removing Bias 22

Suppose we have an imperfect source of random bits (real world sourcestypically fall into this category) that are already independent but that theprobability of a 0 is 1/2 + ε. To eliminate this bias, John von Neumannsuggested to following algorithm:

Read the bits, two at a time.

Skip 00 and 11.

For 01 and 10 output the first bit.

The probabilities of all 2-blocks are easily computed since we assumeindependence:

00 (1/2 + ε)2 = 1/4 + ε+ ε2

01 (1/2 + ε)(1/2− ε) = 1/4− ε2

10 (1/2− ε)(1/2 + ε) = 1/4− ε2

11 (1/2− ε)2 = 1/4− ε+ ε2

The resulting sequence has no bias, as needed. Of course, it’s a bit shorter.

Density 23

It is easy to define the density of a finite binary word x of length n:

D(x) = 1/n∑

i

xi

But how about infinite sequences?

Definition (Density)

Let α ∈ 2ω and define the density of α up to n to be D(α, n) = 1n

∑i<n αi.

The limiting density of α is

D(α) = limnD(α, n),

Note that there is a huge problem with this definition: limits are preciselydefined in analysis, and there is not much reason to assume that this particularlimit should exist.

The Law of Large Numbers 24

The LoLN says that if we repeat an experiment often, the observed averagedoes in fact converge to the expected value; almost certainly.

For example, for an unbiased coin we should approach the the limiting densityof 1/2. Don’t ask what an unbiased coin is.

Also note that we should not expect the averages to be exactly equal to theexpectation.

For example, performing a one-dimensional random walk with steps ±1 weshould expect to be up to O(

√n) from the origin after n steps.

A Random Walk 25

200 400 600 800 1000

-40

-30

-20

-10

10

20

Counting Blocks 26

We can look at density not just for single bits but for arbitrary finite blocksw ∈ 2m: the number of occurrences of a particular block w in x1x2 . . . xnshould approach n/2m.

A good way of visualizing this is to have α trace a path in a de Bruijn graph.Recall that the de Bruijn graph of order k is defined as

Bk =⟨2k, E

⟩

E = (au, ub) | a, b ∈ 2, u ∈ 2k−1

If we slide a window of width k over α we obtain a sequence of nodes in Bk,and the sequence is in fact a path in the graph.

De Bruijn Graph of Order 3 27

001 011

000 010 101 111

100 110

De Bruijn graphs are Hamiltonian, and the Hamiltonian cycles are called deBruijn sequences: they contain every block of length k exactly once and havelength 2k + k − 1.

Any finite or infinite sequence of bits (of length at least k) traces a path in thegraph. If the sequence is unbiased the path hits each node equally often in thelimit.

Example 28

The first 30 bits of the binary expansion of√

5 produces

001 011

000 010 101 111

100 110

After 59 bits all edges lie on the path.

In practice, counting blocks up to a certain size is a very good test forrandomness (χ2 test).

Decimation 29

How about using Roman military traditions todefine randomness?

In 1919 Richard von Mises suggested a notionof randomness based on the limiting density ofthe sequence itself and certain derivedsubsequences.

The idea is that “reasonable” subsequences ofthe given sequence should also have limitingdensity 1/2.

Definition

An infinite sequence α ∈ 2ω is Mises random if the limiting density of anysubsequence (xij ) is 1/2 where the subsequence is selected by a Auswahlregel.

Auswahlregeln 30

So what on earth is a Auswahlregel, a selection rule?

Intuitively, the following all should have limiting density 1/2:

x0, x1, x2, . . . , xn, . . .

x0, x2, x4, . . . , x2n, . . .

x1, x4, x7, . . . , x3n+1, . . .

x0, x1, x4, . . . , xn2 , . . .

In fact, we might want for any reasonable strictly monotonic functionf : N→ N that

xf(0), xf(1), xf(2), . . . , xf(n), . . .

has limiting density 1/2.

Mises’ Definition 31

The actual definition used by Mises is a bit more technical. He considersfunctions

Φ : 2? → B

and selects αn to be in the subsequence αΦ if Φ(α[n]) holds.

However, there is one big caveat: the function Φ must be defined without anyknowledge of α: otherwise we can simply pick a subsequence of all 0’s.

Now suppose we have a countable system of Auswahlregeln and our sequencepasses all these tests. In other words, for all Φ we have

D(αΦ) = 1/2.

Then α is Mises-random. One can show that for any countable collection ofAuswahlregeln there are always uncountably many sequences that are randomin this sense.

Sounds all eminently reasonable.

Ville’s Counterexample 32

Unfortunately, in 1939 J. Ville showed that for any countable system ofAuswahlregeln there is always a sequence α that passes all the tests (i.e., thelimiting density is 1/2 for all these subsequences) but that is nonetheless biasedtowards 1.

More precisely, it was known that a random sequence should have

lim supn

√2n

log logn

(D(α, n)− 1/2

)= 1

lim infn

√2n

log logn

(D(α, n)− 1/2

)= −1

and Ville’s example violated the second condition.

Kolmogorov-Randomness 33

Kolmogorov suggested to use incompressibility as a measure of randomness.

Definition

An infinite sequence α ∈ 2ω is Kolmogorov-random if for some constant c:K(α[n]) ≥ n− c for all n.

So the prefixes α[n] are algorithmically incompressible with the same constantc; α[n] is the shortest description of itself. Chaitin’s Ω is an example of asequence that is random in this sense.

Again, incompressibility is very similar in spirit to the notion of randomness:there is no rhyme nor reason, one has to have a full record to reconstruct thesequence.

The Importance of Prefix Complexity 34

It is important that the notion of incompressibility is based on Chaitin prefixcomplexity, not the more intuitive Kolmogorov complexity.

Theorem (Martin-Lof)

Let f be computable such that∑

2−f(i) diverges. Then for any α ∈ 2ω thereare infinitely many n such that C(α[n]) < n− f(n)

The proof is quite involved. Note that f(n) = logn satisfies the hypothesis, sowe can infinitely often shorten the description in plain Kolmogorov-Chaitincomplexity by a logarithmic amount.

Martin-Lof Randomness 35

Here is another approach to randomness: construct a collection of test that aresupposed to detect non-randomness; a sequence is random if it survives all thetests.

As we will see, computability plays a major role in describing the tests, withoutit, our tests would rule out all sequences.

It is helpful to think of an infinite sequence α ∈ 2ω as an infinite branch in thefull infinite binary tree 2?.

The initial segments are ordinary finite binary words and we can try to expressconditions on α by placing conditions on prefixes α[n].

Cylinders and Measure 36

We will need to measure the size of sets S ⊆ 2ω.

To this end let x be a finite word and define the cylinder generated by x and itsmeasure by

cyl(x) = α ∈ 2ω | x < α

µ(cyl(x)) = 2−|x|

These cylinders form basic open sets in 2ω and any open set S can be writtenas the disjoint union of countably many of them. Hence we can measure S:

S =⋃

i

cyl(xi)

µ(S) =∑

i

2−|xi|

Intuition 37

Think of 2ω as the real interval [0, 1] ⊆ R.

Given a binary word x = x1x2 . . . xn, think of cyl(x) as a real intervalcomprising all numbers with binary expansion

0.x1x2 . . . xnz0z1z2 . . .

which is an interval of length 2−n.

We will try to cover a given set of reals by a collection of intervals of shrinkingsize (a set of constructive measure zero).

Example: Density 38

For example, to weed out sequences with limiting density at least 2/3 we canuse the following test sets:

Kn =⋃ cyl(x) | |x| ≥ n ∧D(x) ≥ 2/3

Then any sequence in K =⋂nKn has limiting density at least 2/3 and should

be eliminated from our pool of potential random sequences.

Of course, we need to perform countably many such tests to make sure thatthe limiting density is not larger that 1/2 + ε for any ε > 0 (actually, not; seebelow).

General Case 39

More generally, we consider tests analogous to the density test: we want adescending chain of open sets

K0 ⊇ K1 ⊇ K2 ⊇ . . . ⊇ Kn ⊇ . . .

where µ(Kn) ≤ 2−n so that these sets are becoming “small” as n increasesand their intersection has constructive measure zero.

We eliminate all sequences in this set K =⋂nKn.

For the math guys: K is a Gδ null set.

Note that it is not so easy to show that the Kn’s from the example above aresufficiently small.

The Catch 40

We want to declare a sequence α to be random if it survives this kind of testfor various choices of (Kn). For example, we would want to apply all possibledensity tests, for blocks of arbitrary lengths.

Unfortunately, we cannot simply allow arbitrary tests (Kn): if we do, then allsequences are eliminated.

To see why, let α ∈ 2ω arbitrary and define a special test Kαn = cyl(α[n]).

Then Kα =⋂Kαn = α.

So, if we want to get anything useful out of this, we need to limit thepermissible tests (Kn).

Computability to the Rescue 41

The key in designing a usable randomness test lies in imposing a computabilityconstraint.

Definition

A sequential test has the additional property that

K = (n, x) | x < α ∈ Kn ⊆ 2?

is semidecidable.

As a consequence, there are only countably many sequential tests.

And, we certainly can no longer design a test that eliminates an arbitrarysequence α (unless α is computable).

Applying a Sequential Test 42

By definition, we can effectively enumerate all the pairs (n, x) inK = (n, x) | x < α ∈ Kn .

What does it mean for α to pass this test?

We need α ∈ ⋂Kn where

Kn =⋃

(n,x)∈Kcyl(x)

So for every n, we have to find an x such that (n, x) ∈ K and x < α.

Of course this is not effective, even assuming that we have α as an oracle. Butit’s not terribly far off.

Universal Tests 43

Taking our definitions at face value, we would have to check all possiblesequential tests to check for randomness.

Here is an amazing result that shows that in essence we only need to deal witha single test. The proof uses the existence of a universal Turing machine and isnot particularly difficult.

Theorem (Universal Test)

There is a universal sequential test U such that for any sequential test K wehave Kn+c ⊆ Un for some constant c.

Martin-Lof Randomness 44

Definition

An infinite sequence α is Martin-Lof random if it passes a universal sequentialtest.

This definition is very strongly supported by the empirical fact that anypractical test of randomness in ordinary probability theory can be translatedinto a sequential test. So, we are just dealing will all of these tests at once(plus all conceivable others).

As it turns out, our last two approaches produces the same notion ofrandomness.

Theorem

A sequence is Martin-Lof random iff it is Kolmogorov-random.

How Bad is Randomness? 45

The definition may be elegant and right, but, sadly, it does not yield anymethods to construct a random sequence. Aux contraire:

Theorem

Any Martin-Lof random sequence fails to be computable.

Chaitin’s Ω is random and turns out to be ∆2 in the arithmetical hierarchy.Close, but not computable.

Exercises 46

Exercise

Give a detailed proof that limiting density can be checked by a suitable test set.In particular, make sure that semi-decidability holds.

Exercise

Show how to check for the frequencies of blocks of arbitrary finite length in asequential test.

Exercise

Show that every Martin-Lof random sequence fails to be computable. Assumea random sequence is computable and show how to construct a test thatrejects it.

Exercise

Show that there are uncountably many Martin-Lof random sequences (in fact,they form a set of measure 1).

Randomness and Physics

Formalizing Randomness

3 Generating PRNs

Pseudo-Randomness 48

Anyone attempting to produce random numbers by purely arith-metic means is, of course, in a state of sin.

John von Neumann

In the real world, one often makes do with a pseudo-random number generator(PRNG) based on iteration: the pseudo-random sequence is the orbit of a someinitial element under some function.

x0 = chosen somehow, at random :-)

xn+1 = f(xn)

where f is easily computable, typically using arithmetic and some bit-plumbing.

The Lasso 49

Of course, we are taking a huge step away from real randomness here, thissequence would perish miserably when exposed to a Martin-Lof test. Thefunction f typically operates on some finite domain such as 64-bit words. Everyorbit necessarily looks like so:

So we can only hope to make f fast and guarantee long periods.

Still, there are many applications where this type of pseudo-randomness issufficient.

One has to be very careful with cryptography, though!

The Seed 50

Needless to say, running a PRNG twice with the same seed x0 is going toproduce exactly the same “random” sequence.

This can be a huge advantage, because it makes computations that are basedon the random numbers reproducible (important for debugging andverification).

More generally, if we are willing to pay for a truly random seed, we would hopethat the iterative PRNG would amplify the randomness: we provide m trulyrandom bits and get back n high-quality pseudo-random bits where n m.

Hence PRNGs reduces the need for truly random bits but does not entirelyeliminate them.

The Real Challenge 51

As von Neumann kindly pointed out, we’re hosed, no matter what we do.

So the real question is this: how simple and easy-to-compute can we make oursinful function f and still get pseudo-random numbers that are sufficientlygood to drive certain algorithms?

In the worst case, we could resort to actual quantum randomness, but that isexpensive in many ways.

As it turns out, often we can get away with murder.

Linear Congruential Generator 52

A typical example: a simple affine map modulo m.

xn+1 = a xn + b mod m

The trick here is to choose the proper values for the coefficients. Can be foundin papers and on the web.

A choice that works reasonably well is

a = 1664525 b = 1013904223 m = 232

Note that a modulus of 232 amounts to unsigned integer arithmetic on a 32-bitarchitecture, so this is implementation-friendly.

Multiplicative Congruential Generator 53

Omit the additive offset and use multiplicative constants only.

If need be, use a higher order recurrence.

xn = a1xn−1 + a2xn−2 + . . . akxn−k mod m

For prime moduli one can achieve period length mk − 1.

This is almost as fast and easy to implement as LCG (though there is of coursemore work involved in calculating modulo a prime).

Note, though, that there is more state: we need to store all ofxn−1, xn−2, . . . , xn−k.

Inverse Congruential Generator 54

Choose the modulus m to be a prime number and define the patched inverse ofx to be

x =

0 if x = 0,

x−1 otherwise.

Then we can define a pseudo-random sequence by

xn+1 = a xn + b mod m

Computing the inverse can be handled by the extended Euclidean algorithm.

Again, it is crucial to choose the proper values for the coefficients.

Mersenne Twister 55

Fairly recent (1998) method by Matsumoto and Nishimura, seems to be thetool of choice at this point.

Has huge period of 219937 − 1, a Mersenne prime.

Is statistically random in all the bits of its output (after a bit ofpost-processing).

Has negligible serial correlation between successive values.

Only statistically unsound generators are much faster.

The method is very clever and not exactly obvious.

The algorithm works on bit-vectors of length w (typically 32 or 64).

Let k be the degree of the recursion, and choose 1 ≤ m < k and 0 ≤ r < w.

The MT Recurrence 56

So we are trying to generate a sequence of bit-vectors xi ∈ 2w.

Define the join join(x, y) of x, y ∈ 2w to be the the first w − r bits of xfollowed by the last r bits of y.

join(x, y) = (x1, x2, . . . , xw−r, yw−r+1, . . . , yw)

We can use the join operation to define the following recurrence:

xn+k = xn+m + join(xn, xn+1) ·A

Here A is a sparse companion-type matrix that makes it easy to perform thevector-matrix multiplication.

Companion Matrix 57

The w × w matrix A has the following form:

A =

0 1 0 . . . 00 0 1 . . . 0

. . .

0 0 0 . . . 1aw−1 aw−2 aw−3 . . . a0

Note that z ·A is not really a vector-matrix operation and can be handled inO(w) steps.

This is just a convenient way to describe the necessary manipulations.

Good Parameters 58

Here is one excellent choice for the parameters:

w = 32 k = 624 m = 397 r = 31

and the A matrix is given by

a = 0x9908B0DF

which hex-number represents the entries in the last row of A.

Recall that the recurrence determines xn+k in terms of xn+m, xn and xn+1, sowe need to store a bit of state: xn, . . . , xn+k−1 = xn+623.

So we need to store 624 words, not too bad.

A Miracle 59

This particular choice of parameters achieves the theoretical upper bound forthe period:

2wk−r = 219937 ≈ 4.315× 106001.

After a little bit of post-processing of the sequence (xi)i≥0, this methodproduces very high quality pseudo-random numbers, and is not overly costly.

Documents

CDM (Pseudo) Randomnesssutner/CDM/pdf/40-randomness-6up.pdf · A Million Random Digits With 100,000 Normal Deviates \Normal deviates" simply means that the distribution of the random