Download pdf - Statistical Graphics Random Number Generatorsbanks/111-lectures.dir/lect7.pdf · • Statistical Graphics • Random Number Generators 1. ... • Is there a gender diﬀerence?

7.0 Statistical Graphics and RNG

• Answer Questions

• Statistical Graphics

• Random Number Generators

1

7.1 Statistical Graphics

John Snow helped to end the 1854 cholera outbreak through use of a

statistical graphic based on a city map of London. The map shows

the pattern of the disease outbreak, and illustrates the importance of

exception analysis.

Snow was Queen Victoria’s physician and a protege of Florence

Nightingale.

He also found a smart way to estimate the literacy rate. Guess how he

did it?

2

3

The second graphic shows the age-adjusted incidence of stomach cancer

for white males, for cases between 1970-1994. We can compare that with

a similar map for 1950-1969.

• Is there a gender difference?

• What is going on in Nevada?

• What is going on in New Mexico?

• What is going on in Wisconsin, Minnesota, and North Dakota?

• What about Pittsburgh?

• What about Maine?

How do we interpret single-county hotspots?

4

5

The third graphic shows the pedestrian fatality rates by state. Florida is

the worst, and has the top five cities in the country. What might explain

this (consider also New Mexico and Arizona).

The fourth graphic is by Charles-Joseph Minard; Richard Tufte hails

it as the best statistical graphic ever. It shows the size of Napoleon’s

army in 1812-1813, as he attacks Czar Alexander III in Moscow and then

retreats.

The graphic includes information on:

• location (two dimensions)

• time

• temperature

• size of the army

6

7

8

9

10

7.2 Random Numbers

In order to generate “random” numbers, it is sufficient to generate

random binary strings.

Toss a fair coin an infinite number of times, with heads being 0 and tails

being 1, to get a sequence X1, X2, . . .. This can be converted into a

random number U that is uniformly distributed on [0, 1] by

U =∞∑

i=1

Xi2−i.

If you have a random number that is uniform on [0, 1], then the random

number X = F−1(U) is a random draw from the distribution F (x). So all

you need for any kind of random number is a set of random coin tosses.

11

Real coins aren’t random enough, or practical for the two main

applications:

• computer simulations

• cryptosecurity.

Good Random Number Generators (RNGs) are fast, repeatable (i.e.,

have a seed), do not cycle, have sensitive dependence on the seed, and

pass statistical tests for randomness.

In practice, there are three strategies for building random number

generators (RNGs):

• Amplify physical (quantum) noise.

• Use provably hard algorithms (trapdoor codes), such as factoring

large numbers that are products of two primes.

• Use linear congruential generators.

12

The first method has never been able to pass statistical tests for

randomness. The sequences always show patterns introduced by the

amplification mechanism.

The second method is widely used in cryptography, but there are issues.

It is not repeatable, in the sense needed for replicating a computer

experiment. It cannot produce an infinite string of binary digits:

eventually, you factor the number. And the big fear is that some clever

mathematician will discover a new way for factoring large numbers.

Nonetheless, trapdoor codes are wildly popular in cryptography, and

quite reliable. RSA encryption is one famous example—it is the basis for

most on-line credit card transactions.

13

For simulation, computer games, and other applications, linear

congruential generators are used.

Xn+1 ≡ (aXn + c) (mod m)

where v ≡ w (mod m) means that v is the remainder when w is divided

by m, and

• Xn is current random integer,

• Xn+1 is the next random integer in the sequence

• m is the modulus (a very large integer)

• a and c are carefully chosen constants.

The initial value, X0, is called the seed of the linear congruential

generator. The Xi are written in binary.

14

Linear congruential generators are not perfect. There is some correlation

in the sequence: if one uses them to plot points in an k-dimensional

space, the points will lie upon up to m1/k hyperplanes.

On the other hand, these are fast, use little memory, can have cycle time

m, and are replicable if one archives the seed.

15

When one has a long sequence of binary random digits, One can try to

test whether the sequence is random.

One strategy is to do a series of hypothesis tests:

1. The null is that the proportions of 1s is 1/2; the alternative is that it

is not.

2. The null is that the proportion of sequential pairs (0, 0) [and (0, 1),

(1, 0), (1, 1)] is 1/4; the alternative is that it is not.

5. The null is that the proportion of sequential triples (0, 0, 0) is 1/8;

the null is that it is not; etc.

You will soon learn how to make such tests. You could even adjust for

the multiple testing problem, an important issue that we cover later.

But letting Xi be 0 or 1 according to the oddness or evenness of the ith

digit of π would pass all these tests.

16

It is provable that one cannot design a test that will eventually detect

all possible patterned sequences. But one can design a sequence of tests

that will discover many different kinds of patterns.

Information theory has shown that a truly random sequence cannot be

compressed. A string is compressible if it can be encoded in such a way

that the coded version requires fewer bits than the original string.

So one way to test a random number generator is to feed its output into

gzip, JPEG2000, and the Lempel-Ziv compression algorithms, and see if

the result is substantially shorter.

Another theorem: If sequence X1, X2, . . . is added to sequence Y1, Y2, . . .

to produce Z1, Z2, . . . where Zi = Xi + Yi (mod 2), then the Z sequence

is at least as random as the most random of the X and Y sequences.

17