The Law of Benfordborgbros/nonlin/benford.pdf · Benford's law has been put to practical use for e.g. detecting income tax frauds - faked number series stick out because they generally

benford.mcd - version 6.8.2001 1

"The Law of Benford"The "significant-digit paradox" was probably first discovered by the astronomer Simon Newcomb in 1881 when he observed that the pages at the beginning of a book with logarithms tended to be wearied much more than the back pages. The physicist Frank Benford rediscovered this phenomenon in 1938 while working at General Electric. He compiled sets of numbers from baseball scores to numbers extracted from Reader's Digest and verified the nonuniform distribution of the most significant digit. Take the most significant digit of all these numbers, and alas, one finds that 1, 2, ..., 9 do not occur with the same frequency 1/9 which one would perhaps expect to be the statistical average. Instead digit D occurs

on the average with a frequency of log1 D+

D

where log is the

logarithm with base 10. This formula was first found empirically by Simon Newcomb in 1881. Later in 1961 the mathematician Roger Pinkham proved Newcomb's formula from the assumption that the numbers exhibit a scale-invariance; that is, their statistical properties are invariant vis-à-vis changes by multiplicative factors (such as occurs when e.g. one exchanges sums of money between different currencies). Theodore Hill presented in 1996 a general theorem which showed that numbers drawn from different distributions tend to conform to the Newcomb-distribution. Benford's law has been put to practical use for e.g. detecting income tax frauds - faked number series stick out because they generally do not adhere to Benford's law. (M J Nigrini reports that he has applied the DA - "digital analysis" - to Bill Clinton's tax returns and found no evidence of fraud.) Here we will illuminate Benford's law by taking a random series of numbers in the interval 1 to 1000 and subject it to "scrambling"; i.e. we multiply the numbers with factors picked at random from the interval 1 to 9 and iterate this procedure several times. The idea is that by this we will obtain a scale invariant series since further multiplication by any factor will not significantly change its statistics. An important trick used in these circumstances is to shift the decimal point in the numbers so that they will stay in the interval [1, 10). For these number one can then also obtain a continuous probability distribution on the interval [1,10]. The content of Newcomb's formula is then that Prob Z z<( ) log z( )= .

Frank G Borg email: [email protected]

Chydenius InstituteHUR Research Lab

Frank Borg

Update 6 Aug 2001: Appendix II - Benford and Gaussian mixtures Appendix III - Benford and dynamics


1. Generate (pseudo-)random numbers

N 400:= n 0 N 1−..:=

X runif N 1, 1000,( ):= Uniformly distributed numbers in the interval 1 to 1000.

2. The decimal point shifting function

Zf x( ) nn 0←

nn nn 1+←

x 10nn>while x 1>if

nn nn 1−←

x 10nn<while x 1≤if

xx

10nn 1−←

xx

10← x 10≥if

:= The function y = Zf(x) displaces the decimal point in x such that the result y stays in the interval 1 to 10. E.g.

Zf 234( ) 2.34=

Zf 0.00234( ) 2.34=

3. The "scrambling" function

ZXf Y( ) nn length Y( )←

λ 1 rnd 8( )+←

Zi Zf λ Yi⋅( )←

i 0 nn 1−..∈for

Z

:= This "scrambling" function operates on an array of numbers by multiplying them with random numbers from the interval 1 to 9 and then applies the decimal point shifting Zf-function.




4. Function which calculates the statistics of the significant-digit

digstat Y( ) nn length Y( )←

Zi 0←

i 0 9..∈for

s num2str Y i( )←

sd substr s 0, 1,( )←

d str2num sd( )←

Zd Zd 1+←

Yi 1≥ifi 0 nn 1−..∈for

Z

:= This function will compute the statistics of the first-digit of the numbers in an array Y.

Extracts the first digit ofthe number. If the digit is d we increment Zd byone.

5. Function that iterates the scrambling n times.

Znf Y n,( )

Y ZXf Y( )←

k 0 n 1−..∈for

Y

:=

We apply the iterated scrambling to the random series X:

Itn 40:= Number of iterations.

Iterates the scrambling. This will generate an approximately scale-invariant series of numbers in the range [1, 10). The scrambling may be considered as a dynamic process and the Newcomb distribution as its "attractor".

Xn Znf X Itn,( ):=




6. The frequencies

stat digstat X( ):= Calculates the statistics for the random series ...

statn digstat Xn( ):= ... and for the scrambled series.

A noticeably difference between the statistics of the random series and the scrambled series will be observed. The random series has nearly equal frequencies for the significant digits d = 1, ... ,9, whereas for the scrambled series the frequency tends to decrease with increasing d. (To control the "stability" of the result, mark the runif-function in section 1 above and then press F9 which will recalculate the whole thing with a new set of "seeds".)

k 1 9..:=

0 2 4 6 8 100

50

100

150

random series"scrambled" series

Frequencies of the first digit

statk

statnk

k

7. The distributions

We will next compute histograms using our data in order to obtain the probability distributions.




Again, by marking the runif-function in section 1 and pressing F9 one can repeat the calculations with new series and one should find that the distribution for the scrambled series varies closely around the Newcomb distribution whereas the random series follows an almost straight line in accordance with a uniform distribution of the significant digit.

0 2 4 6 8 100

0.5

1

"scrambled" seriesrandom seriesNewcomb

Distributions of Xn, X0 and Newcomb

dznN

dz0N

NC

zintv

The results will be compared with the Newcomb distribution.

NCi fnc zintvi( ):=

fnc z( ) log z( ):=The Newcomb distribution:

dz0i 1+ dz0i i NN 1−<( ) histz0i⋅+:=dz00 0:=

histz0 hist zintv X0,( ):=X0 Zf X( )→

:=

dzni 1+ dzni i NN 1−<( ) histzni⋅+:=dzn0 0:=

Calculates the histogram for scrambled data. For the random series X we have first to apply the decimal point shifting function Zf.

histzn hist zintv Xn,( ):=zintvi 1 9i

NN⋅+:=

Number of intervals to be used for the histogram.

i 0 NN 1−..:=NN 40:=




g g

8. The Kolmogorov-Smirnov test

A standard check for whether an empirical distribution Fn(z) based on a sample of size n agrees with a given theoretical distribution F is the Kolmogorov-Smirnov test. It is based on the fact that the distribution of the variable Dn = max | Fn(z) - F(z)| is independent of the actual distribution F. For large samples n it can be shown

that Dn with probability 0.95 is less than 1.36

n. Thus, if Dn is less

than this it can be said that there is an agreement between the data and the theoretical distribution F on a 95% confidence level.

Resizes the empirical distribution array to fit that of the NC-array.

Fn submatrixdznN

0, length NC( ) 1−, 0, 0,

:=

Dn max Fn NC−( )→

:= The KS-variable.

KS951.36

N:= The KS 95% confidence bound.

With our choice of parameters we should typically have Dn less than the KS-limit KS95. We find Dn 0.02849= and KS95 0.068=. The Kolmogorov-Smirnov test together with digit-statistics thus supplies a method for testing for scale-invariance for sets of numbers.

9. Scale-invariance: theory

Consider our scrambled series whose elements z are in the range [1, 10). The scale-invariance implies that the distribution function

F a b,( ) Prob a z≤ b≤( )=

satisfies F λ a⋅ λ b⋅,( ) F a b,( )=




for λ > 0 such that the arguments stay in the interval [1, 10). Thus F(a,b) is a function of the ratio a/b only:

F a b,( ) Gba

=

From the properties of the probability distribution,

F 1 1,( ) 0=

F a b,( ) F 1 b,( ) F 1 a,( )−=

it follows that the function G(z) must satisfy

G 1( ) 0= G 10( ) F 1 10,( )= 1=

Gba

G b( ) G a( )−=

From this it is then obvious that G(z) can be no other than the logarithmic Newcomb distribution:

F 1 z,( ) G z( )= log z( )=

The above arguments can be repeated mutatis mutandis if we consider the first two digits of numbers (a case also known by Newcomb and Benford). Now we use a decimal point shifting function that keeps the numbers in the interval 10 to 100. For instance, 5924 becomes 59.24. The probability that e.g. the first

two digits are 13 will then be given by log1 13+

13

. If we sum

over all possible combinations of the first two digits we indeed get the total probability one:




10

99

i

log1 i+

i

∑

=

log10010

= 1=

The above can be generalized to the case of the three first digits etc.

Today it may seem odd that the significant-digit phenomenon defied plausible explanations till recent times. One reason for this may have been the entrenched status of the normal (Gaussian) distribution among probabilists. A related "law" is one associated with the name of G K Zipf, a "law" figuring prominently especially in linguistics, and which has been "promoted" i.a. by Benoit Mandelbrot.

What sort of data may approach the Benford law? Typically we have two sorts of Benford data. First, heterogeneous data collected from various sets with different distributions. Secondly, homogenous data whose source has some approximately scale-invariant properties. Thus, Nigrini has a figure which shows the analysis of the population counts of the 3141 US counties (1990 census). The difference (standard deviation) from the Newcomb distribution of the significant-digit is less than 0.7%. Here one might think that this reflects an approximate scale-invariance of the sizes of the counties in the data. A similar analysis by Arnér for the 289 Swedish counties (census 1998) shows also a quite good agreement with the Newcomb distribution though it narrowly exceeds the Kolmgorov-Smirnov 95% confidence bound (Dn = 0.10 and KS95 = 0.08). Indeed, in the "digital analysis" pioneered by Nigrini the digit-distributions are used as "spectrograms". "Spikes/dips" above or below the Newcomb-distribution may indicate systematic fraud, fake, or some threshold phenomenon built into the system.

But, after all, why do the boiling points of the elements accord to the Newcomb distribution?

Nigrini (1999) estimates that some 150 academic papers have been published on Benford's law.




10. Bibliography

Magnus Arnér: På Irrfärder i Slumpens Värld. Studentlitteratur 2001. ["Tabellparadoxen", chapter 3. - In Swedish.]

Frank Benford: "The law of anomalous numbers". Proc. Amer. Phil. Soc. 78 (1938) 551- 572.

James Berger: Statistical Decision Theory and Bayesian Analysis. 2. ed. Springer Verlag 1985. [Page 86.]

M W Browne: "Following Bedford´s law, or looking for no. 1". New York Times (4 August 1998).

Theodore P Hill: "The first-digit phenomenon". American Scientist 86 (1996) 358 - 363.

Theodore P Hill: "The significant-digit phenomenon". Amer. Math. Monthly 102 (1995) 322 - 327.

Benoit Mandelbrot: The Fractal Geometry of Nature. W H Freeman 1983.

Robert Matthews: "The power of ten". New Scientist (10 July 1999).

Simon Newcomb: "Note on the frequency of use of different digits in natural numbers". Amer. J. Math. 4 (1881) 39 - 40.

Mark J Nigrini: "I have got your number. How a mathematical phenomenon can help CPAs uncover fraud and other irregularities". <www.aicpa.org/pubs/jofa/may1999/niggrini.htm>

Roger S Pinkham: "On the distribution on the first significant digits". Ann. Math. Stat. 32 (1961) 1223 - 1230. R Raimi: "The first digit problem". Amer. Math. Monthly 83 (1976) 521 - 538.

R Raimi: "The peculiar distribution of first digits". Scientific American (Dec 1969) 109 - 119.

M A Snyder, J H Curry, and A M Dougherty: "Stochastic aspects of one-dimensional discrete dynamical systems: Benford's law". Phys. Rev. E Vol. 64. No.2 (2001).

"For me the law is a prime example of a mathematical idea which is a surprise to everyone - even the experts".

T Hill

"For me, Benford is a great hero. His law is not magic, but sometimes it seems like it". M J Nigrini.

"Thus all the explanations of [Benford's law] so far given seems to lack something of finality .... the answer remains obscure".

R Raimi (1969).




Appendix I. Newcomb frequencies for the digits

From the above arguments it follows that the probability, in a Benford series, that the nth digit dn is k is equal to

Prob dn k=( )10n 1−

10n 1−

i

logi 10⋅ k+ 1+

i 10⋅ k+

∑

=

=

if n > 1 (k = 0, 1, ..., 9) and in case n = 1 (k = 1, 2, ..., 9)

Prob d1 k=( ) log1 k+

k

=

Next we calculate a table of the Newcomb frequencies up to the 5th digit.

table

ds num2str n 1+( )←

s concat ds ". digit",( )←

p0 n 1+, s←

n 0 4..∈for

pk 1+ 0, num2str k( )←

k 0 9..∈for

p0 0, "Digit"←

pk 1+ 1, if k 0> log 11k

+

, "void",

←

k 0 9..∈for

pk 1+ n 1+,

10n 1−

10n 1−

i

logi 10⋅ k+ 1+

i 10⋅ k+

∑

=

←

k 0 9..∈for

n 1 4..∈for

p

:=




Table of the frequencies of the first to the fifth digit according to the Newcomb distribution:

table

"Digit" "1. digit" "2. digit" "3. digit" "4. digit" "5. digit""0" "void" 0.11968 0.10178 0.10018 0.10002"1" 0.30103 0.11389 0.10138 0.10014 0.10001"2" 0.17609 0.10882 0.10097 0.10010 0.10001"3" 0.12494 0.10433 0.10057 0.10006 0.10001"4" 0.09691 0.10031 0.10018 0.10002 0.10000"5" 0.07918 0.09668 0.09979 0.09998 0.10000"6" 0.06695 0.09337 0.09940 0.09994 0.09999"7" 0.05799 0.09035 0.09902 0.09990 0.09999"8" 0.05115 0.08757 0.09864 0.09986 0.09999"9" 0.04576 0.08500 0.09827 0.09982 0.09998

=

For increasing n there is a rapid convergence to the uniform distribution of the digits with frequency 1/10. This is also proved from the above formulas using the facts that for big numbers S we have

1

S

i

1i∑

=

≈ ln S( )

ln 11S

+

≈1S

Appendix II. Benford series by mixing Gaussian distributions

Above we demonstrated how a Benford series was generated by "scrambling". Picking numbers randomly from various distributions generates also Benford series if the distributions are in a certain sense "unbiased", as has been pointed out by T Hill. We can demonstrate this effect by picking numbers randomly from Gaussian distributions whose standard deviations are set randomly. Intuitively this procedure will result in an approximately scale invariant series if the standard deviations are varied over a range large enough.




K 1000:= n 0 K 1−..:= Size of data.

gmix

σ rnd 1000( )←

si rnorm 1 0, σ,( )0←

i 0 K 1−..∈for

s

:=Picks numbers from Gaussian distributions with randomly assigned standard deviations σ in the range of 0 to 1000.

0 200 400 600 800 10004000

2000

0

2000

4000

gmixn

n

Transforms the data to the interval 1 to 10 and calculates the digit statistics. Calculates then the histogram and the distribution which is compared with the Newcomb distribution.

Zgmix Zf gmix( )→

:=

sgmix digstat Zgmix( ):=

gm hist zintv Zgmix,( ):=

dgm0 0:= dgmi 1+ dgmi i NN 1−<( ) gmi⋅+:=

0 2 4 6 8 100

0.5

1

Gaussian mixtureNewcomb

dgmK

NC

zintv




It can be pointed out that if we draw numbers from given Gaussian distribution we also tend to get a digit statisticis which differs significantly from the uniform 1/9 distribution:

σ 100:= gs rnorm K 0, σ,( ):=

zgs Zf gs( )→

:= hgs hist zintv zgs,( ):= dgs0 0:=

dgsi 1+ dgsi i NN 1−<( ) hgsi⋅+:=

0 2 4 6 8 100

0.5

1

gaussian Newcomb

dgsK

NC

zintv

stg digstat zgs( ):= nck log 11k

+

:=

0 2 4 6 8 100

0.1

0.2

0.3

0.4

gaussian Newcomb

stgkK

nck

k




The following seems to be true: if we draw numbers from N(0,20) we tend to get a digit distribution with "2" in excess when compared with the Newcomb distribution; with numbers from N(0,30) we get "3" in excess etc. Given a probablility distribution F then the probability, that for a number drawn from this distribution its first digit is 1, is given by

Prob d1 1=( )n

Prob 10n x≤ 2 10n⋅<( )∑=n

F 10n 2 10n⋅,( )∑=

Similarly for digits "2", "3", etc. We can implement this formula for calculating the digit distribution for a series with distribution N(0,σ).

gaussdig σ( ) nn ceil log σ( )( ) 2+←

Zk 2nn−

nn

n

pnorm k 1+( ) 10n⋅ 0, σ,

pnorm k 10n⋅ 0, σ,( )−+...

∑=

⋅←

k 1 9..∈for

Z

:=

dstat gaussdig σ( ):=

1

9

j

dstat j∑=

1=

0 2 4 6 8 100

0.1

0.2

0.3

0.4

Gaussian simul.NewcombGaussian theor.

stgkK

nck

dstatk

k




The distribution function FZ in terms of the decimal shifted variable z of |x| is given by

FZ z( )

n

Prob 10n x≤ z 10n⋅<( )∑=

Thus in the Gauss case we have the following approximate implementation:

Fgauss z σ,( ) nn ceil log σ( )( ) 2+←

Z 2

nn−

nn

n

pnorm z 10n⋅ 0, σ,( )pnorm 10n 0, σ,( )−+

...

∑=

⋅←

Z

:=

fgi Fgauss zintvi σ,( ):=

0 2 4 6 8 100

0.5

1

GaussianNewcomb

fg

NC

zintv

An aside: comparing the entropies. The Shannon's entropy for a discrete probability measure p is defined by




Entropy p( )

1

length p( ) 1−

i

pi ln pi( )⋅∑=

−:=

It attains maximum for the uniform distribution p1 = ... = pn = 1/n. We can calculate how much the entropy for the 1st digit statistics deviates from the uniform value for the Gaussian and the Newcomb statistics.

maxentropy ln 9( ):= Maximum entropy.

maxentropy 2.19722=

Entropy dstat( ) 1.9665= Entropy for Gaussian numbers.

Entropy nc( ) 1.99343= Entropy for Newcomb statistics.

Appendix III. Benford and dynamics

In a recent paper Snyder, Curry and Dougherty studies discrete one-dimensional dynamical systems which produces Benford series. A discrete dynamics is defined by a map

xn 1+ f xn( )=

A classical one-dimensional example is the logistic map

f x( ) λ x⋅ 1 x−( )⋅=

on the interval [0.1] for 0 ≤ λ ≤ 4 . Given an initial value x0 one may study the statistics of its orbit; i.e. the points xn = fn(x0). In the case λ = 4 the logistic map becomes a chaotic map (also called the Ulam map). The statistics of the orbits are generally independent of the starting point x0 for the chaotic maps.




Also the distribution of the points is invariant vis-à-vis the map f, since a further iteration by f on the orbit does not change anything. However, only a in a few cases (the Ulam map is one of them) has it been possible to find an analytic expression for the invariant distributions. If we can find such an expression, then as the authors show, it is easy to check whether the points of the orbit satisfy the Newcomb digit-distribution. Thus they can show that the Ulam map and the so-called tent map do not produce Benford series. In case we do not have an expression for the distribution it remains to explore the distribution numerically. For instance, the authors provide numerical evidence that the so-called Renyi map produces Benford series.

The Renyi map is defined on [0, ∞) by

f x( )x

1 x−( )x 1<if

x 1− otherwise

:= Renyi map

N 1000:= n 0 N 1−..:=

x0 rnd 100( ):= xn 1+ f xn( ):=

0 200 400 600 800 10001 .10 3

0.01

0.1

1

10

100

1 .103 Renyi series

xn

n




We may do an explorative simulation with the Renyi map computing n2 points for each n1 randomly chosen orbits and determine their statistics:

SR n1 n2, f, seed, d,( )Zi 0←

i 0 9..∈for

x0 rnd seed( ) d−←

x j 1+ f x j( )←

j 0 n2 1−..∈for

x Zf x( )→

←

Z Z digstat x( )+←

i 0 n1 1−..∈for

Z

:=

ZZ SR 100 500, f, 100, 0,( ):= Ztot

0

9

n

ZZn∑=

:=

ZZZZZtot

:= Calculates the digit statistics.

0 2 4 6 8 100

0.1

0.2

0.3

0.4

RenyiNewcomb

ZZk

nck

k




The reason why the Renyi map seems to have the asymptotic Newcomb distribution is perhaps due to its "slingshot" mechanism which effects a "scrambling" leading to scale invariance.

Heuristic argument: Apparently this Benford property has to do with the *slingshot effect* of the Renyi map; points of the form x = 1 - r are ejected to x =1/r - 1. The graphs of the Renyi orbits also resemble fractal curves with triangles of various heights. If r > 0 is uniformly distributed near 0 we may expect the big numbers -- the height of the triangles -- x := 1/r to have a pdf p(x) of the form 1/x^2. The pdf q(x) that we have a big number x in the series would then be proportional to q(x) = p(x) + p(x+1) + p(x+2) + .... := 1/x (since a triangle of height x implies that the numbers x-1, x-2, ... are in the series too, due to the definition of the Renyi map for x > 1). Now use the decimal shifting trick and calculate the distribution function for the decimal shifted variable 1 =< z < 10. F(z) = sum Prob(10^n < x < z*!0^n) proportional to integral from {10^n} to {z*10^n} dx/x = ln(z). That is, we would obtain a logarithmic function in z in accordance with the Newcomb distribution. The part 0 < x < 1 seems to be sort of a mirror (?) of the case x > 1 (take a look at a graph using log-scale y-axis). When x is close to zero it takes about 1/x steps to grow back to the size around 1. It might be that if one was to try to flesh out the argument above there would be sort of a circular-argument use of the distribution of points in the series. However, that might be ok since it proves that the distribution is the invariant distribution of the time series in the asymptotic sense. (Indeed, one should expect it to be easier to theoretically verify whether a dynamical system has a certain invariant measure than to actually produce one in the general case.)

"Scale invariance" of fractal time series, such as random walk, suggests these have also peculiar distributions of the digits. As an example we consider the Brownian random walk:

br x( ) x rnd 1( )+ 0.5−:=Brown process.

z0 rnd 10( ) 5−:= zn 1+ br zn( ):=




0 200 400 600 800 100010

5

0

5

10Brown series

zn

n

ZB SR 100 10000, br, 1, 0.5,( ):=

Calculates the digit statistics for a number of Brown series. There seems to be a significant deviation from the Newcomb distribution but the result fluctuates quite much for small series.

ZBZB

0

9

n

ZBn∑=

:=

0 2 4 6 8 100

0.1

0.2

0.3

0.4

FractalNewcomb

ZBk

nck

k




General fractal curve with power spectrum exponent β:

noise1f n ∆t, β,( ) nn 2n 1− 1+←

N 2n←

fqkkN

1∆t

⋅←

k 0 nn 1−..∈for

amp rnorm nn 0, 1,( )←

θ runif nn 0, 2 π⋅,( )←

Fkampk e

i θk⋅⋅

fqk( )β2

←

k 1 nn 1−..∈for

F0 F1←

Z IFFT F( )←

Z

:=

∆t1

100:= The time step used below.

Example with 1/f-noise (β = 1):

xx noise1f 10 ∆t, 1,( ):= kk 0 length xx( ) 1−..:=




0 500 1000 150060

40

20

0

20

401/f-noise

xxkk

kk

The next function calculates the digit statistics for the fractal time series.

n1f n1 n, ∆t, β,( )Zi 0←

i 0 9..∈for

z noise1f n ∆t, β,( )→

←

x Zf z( )→

←

Z Z digstat x( )+←

i 0 n1 1−..∈for

Z

:=

It might also be of interest to compare a "Zipf distribution" with the Newcomb distribution and the digit distribution of the fractal time series.

The "Zipf" distribution of the digits.

SS

1

9

n

1n∑

=

:= SS 2.82897= Zpk1

SS1k

⋅:=




β 2:=

Next we generate Brownian process via the Fourier-transform method.Zif n1f 100 10, ∆t, β,( ):=

ZifZif

0

9

n

Zif n∑=

:=

This exploratory simulation suggests Brownian (β = 2) has a digit statistics which is closer to the Zipf distribution than the Newcomb distribution:

0 2 4 6 8 100

0.1

0.2

0.3

0.4

FractalNewcomb"Zipf"

Zif k

nck

Zpk

k



Documents

The Law of Benfordborgbros/nonlin/benford.pdf · Benford's law has been put to practical use for e.g. detecting income tax frauds - faked number series stick out because they generally