Old and new algorithms for computing Bernoulli numbersweb.maths.unsw.edu.au/~davidharvey/talks/bernoulli.pdf · Old and new algorithms for computing Bernoulli numbers David Harvey

Old and new algorithms for computing Bernoullinumbers

David Harvey

University of New South Wales

25th September 2012, University of Ballarat

David Harvey Old and new algorithms for computing Bernoulli numbers

Bernoulli numbers

Rational numbers B0,B1, . . . defined by:

x

ex − 1=∑n≥0

Bn

n!xn.

The first few Bernoulli numbers:

B0 = 1 B1 = −1/2

B2 = 1/6 B3 = 0

B4 = −1/30 B5 = 0

B6 = 1/42 B7 = 0

B8 = −1/30 B9 = 0

B10 = 5/66 B11 = 0

B12 = −691/2730 B13 = 0

B14 = 7/6 B15 = 0


Bernoulli numbers

Applications:

I Sums of the form 1n + 2n + · · ·+ Nn

I Euler–Maclaurin formula

I Connections to Riemann zeta function, e.g. Euler’s formula:

ζ(2n) =1

12n+

1

22n+

1

32n+ · · · = (−1)n+1 (2π)2nB2n

2(2n)!

I Arithmetic of cyclotomic fields, p-parts of class groups

I . . .


Three algorithmic problems

I will discuss three algorithmic problems:

1. Given n, compute B0,B1, . . . ,Bn.

2. Given n, compute just Bn.

3. Given prime p, compute B0,B2, . . . ,Bp−3 modulo p.

How fast can we solve these problems?


How big is Bn?

Write B2n as a reduced fraction

B2n =N2n

D2n.

D2n has O(n) bits (von Staudt–Clausen theorem).

N2n has Θ(n log n) bits (Euler’s formula and Stirling’s formula).

Example:

B1000 =−18243 . . . (1769 digits) . . . 78901

2 · 3 · 5 · 11 · 41 · 101 · 251.


Computing B0,B1, . . . ,Bn — cubic algorithms

Equating coefficients in(ex − 1

x

)∑n≥0

Bn

n!xn = 1

leads to the recursive formula

Bn = −n−1∑k=0

1

k!(n − k + 1)Bk .

Given B0, . . . ,Bn−1, we can compute Bn using O(n) arithmeticoperations.

But each Bk has O(k log k) bits, so we need O(n2+o(1)) bitoperations.

For computing B0, . . . ,Bn, we need O(n3+o(1)) bit operations.


Computing B0,B1, . . . ,Bn — cubic algorithms

There are many other identities of this sort that lead to O(n3+o(1))algorithms for computing B0, . . . ,Bn, for example:

I Seidel (1877)

I Ramanujan (1911)

I Knuth–Buckholtz (1967)

I Atkinson (1986)

I Akiyama–Tanigawa (1999)

I Brent–H. (2011)

They have various big-O constants and numerical stabilityproperties.


Computing B0,B1, . . . ,Bn — fast algorithms

But B0, . . . ,Bn only occupy O(n2 log n) bits altogether.

Can we compute them all in only O(n2+o(1)) bit operations?

Yes! (Brent–H., 2011)

There are several ways — I will sketch a particularly cute version.



Define the Tangent numbers T1,T2, . . . by

tan x =∑n≥1

Tn

(2n − 1)!x2n−1.

Here are the first few:

1, 2, 16, 272, 7936, 353792, . . .

They are all positive integers, and it turns out that

Tn =(−1)n−122n(22n − 1)

2nB2n.

So it suffices to compute the first n Tangent numbers.



Watch for the non-overlapping coefficients:

9! tan(10−3) = 362.880120960048384019584007936 . . .

We can easily read off the Tangent numbers:

T1 = 362880/9! = 1

T2 = 120960/(9!/3!) = 2

T3 = 48384/(9!/5!) = 16

T4 = 19584/(9!/7!) = 272

T5 = 7936/(9!/9!) = 7936.

In general, to compute T1, . . . ,Tn, it suffices to compute(2n − 1)! tan(10−p) to about n2 log n significant digits, wherep = O(n log n).



How do we compute (2n − 1)! tan(10−p)?

Like this:

9! tan(10−3) = 8!9! sin(10−3)

8! cos(10−3)

= 8!362.879939520003023999928000001 . . .

40319.979840001679999944000001 . . .

Writing down the required digits of sin(10−p) and cos(10−p) costsO(n2 log2 n) bit operations.

Then perform a single floating-point division with O(n2 log n)digits of accuracy. (Some error analysis required.)

Assuming FFT-based algorithms, total cost is O(n2 log2+o(1) n) bitoperations.

(Of course we work in base 2, not base 10!)


Computing Bn

What about computing a single Bn, without necessarily computingB0, . . . ,Bn−1?

Since Bn has only O(n log n) bits, maybe we can do it inO(n1+o(1)) bit operations?

Can we at least do better than O(n2+o(1))?


Computing Bn

I’ll discuss three algorithms:

I The zeta-function algorithm(rediscovered many times... Euler, Chowla–Hartung 1972,Fillebrown–Wilf 1992, Fee–Plouffe 1996, Kellner 2002,Cohen–Belabas–Daly (?))

I The multimodular algorithm (H. 2008)

I A new p-adic multimodular algorithm (H., 2012 — see arXiv)


Computing Bn — the zeta-function algorithm

Idea is to use Euler’s formula

B2n = (−1)n+1 2(2n)!

(2π)2nζ(2n).

It suffices to compute O(n log n) significant bits of the right handside.

It is known how to compute n! and π to this precision inO(n1+o(1)) bit operations.

But computing enough bits of ζ(2n) is more expensive, so weconcentrate on this.


Computing Bn — the zeta-function algorithm

We use the Euler product:

ζ(2n)−1 =∏

p prime

(1− p−2n)

It turns out that we need O(n/ log n) primes to get enough bits,and we need to do O(n1+o(1)) work per prime.

Total: O(n2+o(1)) bit operations.

No better than computing all B0, . . . ,Bn!

(Fillebrown gives the more precise bound O(n2 log2+o(1) n). Maybethis can be improved by a logarithmic factor?)


Computing Bn — the multimodular algorithm

Idea: compute Bn (mod p) for many primes p and reconstructusing the Chinese Remainder Theorem.

Using fast interpolation algorithms, reconstruction step costs onlyO(n1+o(1)) bit operations.

How many primes do we need? Since∑

p<N log p ∼ N, we need allprimes up to O(n log n), i.e. O(n) primes.

The expensive part is computing Bn (mod p) for each p. How dowe do this?



Assume n ≥ 2, n even, and p ≥ 3.

Define ‘integralised’ Bernoulli number

B∗n = 2(1− 2n)Bn ∈ Z.

Basic identity (a ‘Voronoi congruence’):

B∗n = n

p−1∑j=1

(−1)j jn−1 (mod p)

= n(−1n−1 + 2n−1 − · · ·+ (p − 1)n−1) (mod p).

Straightforward evaluation scheme: compute in turn 1n−1, 2n−1,etc, all modulo p.



Better: rearrange terms. Put j = g i for a generator g .

Example to compute B8 (mod 23), using g = 5:

−17−57+27+107+47+207+87−177+167−117−97 (mod 23).

Can keep track of g i , g i(n−1) with only O(1) operations per term.

The sign of each term depends on the parity of g i .

The sum for g11, . . . , g22 is the same as for g0, . . . , g10.

Complexity: O(p1+o(1)) bit operations per prime.

Overall complexity to get Bn: still O(n2+o(1)).



Various tricks to squeeze out large constant factor... beyond thescope of this talk.

Current record for a single Bn is held by the multimodularalgorithm.

Timing data from 2008 paper:

I B107 : 40 CPU hours using zeta function algorithm (Pari/GP).

I B107 : 10 CPU hours using multimodular algorithm.

I B108 : 1164 CPU hours using multimodular algorithm.

This data nicely illustrates quasi-quadratic asymptotics.

Wikipedia reports that Holoborodko recently used myimplementation to compute B2×108 .


Computing Bn — the p-adic multimodular algorithm

While preparing for this talk, I discovered a new algorithm:

Theorem (H., 2012)

The Bernoulli number Bn may be computed in

O(n4/3+o(1))

bit operations.

This is the first known subquadratic algorithm for this problem.



Sketch: the Voronoi identity may be generalised to obtaininformation modulo powers of p:

B∗n =

p−1∑j=0

s−1∑k=0

(n

k + 1

)B∗k+1p

k(−1)j jn−k−1 (mod ps)

(assuming s ≤ n).

For s = 1 this simplifies to the previous formula.

There are ps terms, and we need to do arithmetic modulo ps .

So the complexity of evaluating this sum (ignoring logarithmicfactors) is roughly ps2.



Optimisation problem: select precision parameter sp for each p.

Want to minimise the cost (roughly∑

p ps2p) while ensuring that

we collect enough bits (roughly∑

p sp log p) to determine Bn.

Unfortunately this doesn’t work — it still leads to overall costO(n2+o(1)).

(This remains true even if you throw in bits from ζ(n) to yieldinformation at the prime at infinity.)



But we can rewrite the identity like this:

B∗n =

p−1∑j=0

(−1)j jn−sps−1F (j/p) (mod ps)

where

F (x) =s−1∑k=0

(n

k + 1

)B∗k+1x

s−1−k ∈ Z[x ].

Note F (x) is a polynomial of degree s − 1, and is independent of p.

Select parameters s = n2/3+o(1), M = n2/3+o(1), N = n1/3+o(1).

Evaluate F (j/p) (mod 2M) for every p < N, every 0 ≤ j < p,simultaneously using a fast multipoint evaluation algorithm.

Then add everything up and use the Chinese Remainder Theorem!



Actually one can interpolate between

I O(n4/3+o(1)) time and O(n4/3+o(1)) space, and

I O(n3/2+o(1)) time and O(n1+o(1)) space.

For what n do we beat the previous algorithms in practice?

No idea. I hope to find out soon...


Computing B0, . . . ,Bp−3 (mod p) — motivation

The residues B0,B2 . . . ,Bp−3 (mod p) are closely related to thearithmetic of cyclotomic fields.

Let K = Q(ζp) and let h be the class number of K .

Then

p | h ⇐⇒ p | Bk for some k = 0, 2, . . . , p − 3.

A prime p is regular if Bk 6= 0 (mod p) for all k = 0, 2, . . . , p − 3,i.e. p - h.

Much of Kummer’s work on Fermat’s Last Theorem applies only toregular primes, essentially because one can then always take pthroots in the class group.



If p is not regular it is called irregular.

The smallest irregular prime is p = 37:

B32 =−7709321041217

5100= 0 (mod 37).

We call (37, 32) an irregular pair.

Each irregular pair corresponds to a nontrivial eigenspace in theGalois action on the p-Sylow part of the class group of K .

The irregular pairs have been determined for all p < 163 577 856(Buhler–H., 2011).



Once the irregular pairs for p are known, one can quickly testinteresting conjectures, such as the Kummer–Vandiver conjecture.

This states that the class number of the maximal real subfield

K+ = K ∩ R = Q(ζp + ζ−1p )

is not divisible by p.

Some people think it’s true.

Some people think it’s false.

(It’s true for all p < 163 577 856.)


Computing B0, . . . ,Bp−3 (mod p)

Modern algorithms for computing B0, . . . ,Bp−3 (mod p):

I Power series inversion

I Voronoi identity & Bluestein’s trick

I Voronoi identity & Rader’s trick

All have theoretical bit complexity O(p log2+o(1) p).

These days we can only fight to improve the constant.

(Pre-1992 algorithms had complexity O(p2+o(1)).)


Computing B0, . . . ,Bp−3 (mod p) — series inversion

Idea: invert the power series

ex − 1

x= 1 +

x

2!+

x2

3!+ · · ·+ xp−3

(p − 2)!+ O(xp−2)

in Fp[[x ]].

Actually we only need coefficents of even index. Can save a factorof two by using for example

x2

cosh x − 1= −2

∑n≥0

(2n − 1)B2n

(2n)!x2n.

So we need to invert (cosh x − 1)/x2 to about p/2 terms.



Series inversion can be reduced to polynomial multiplication viaNewton’s method.

So the basic question is: how to multiply polynomials of degreeO(p) in Fp[x ]?

Fastest way to multiply polynomials is via FFT.

But we can’t perform an efficient FFT over Fp, because usually Fp

won’t have sufficiently many roots of unity of smooth order.



We can lift the multiplication problem to Z[x ], and then do either:

I Use floating-point FFT to multiply over R[x ], and roundresults to the nearest integer, or

I Choose a few word-sized primes q such that Fq supportsefficient FFT, multiply in Fq[x ] for each q, then reconstructvia CRT, or

I Some combination of the above.



Suppose f , g ∈ Z[x ], of degree p, with coefficients bounded by p.

Coefficients of h = fg are bounded by p3.

Example from 1992: p ∼ 106, so coefficients of h have 60 bits.

I IEEE double-precision floating-point is only 53 bits!Doesn’t quite fit.

I Could use two primes near 232.



Let’s cheat: represent coefficients in (−p/2, p/2) instead of [0, p).

Then statistically we expect coefficients of h = fg to be boundedby O(p2.5).

Try p ∼ 106 again: now coefficents of h have 50 bits.

This fits into floating-point (just), but no longer provably correct.

Fast-forward to 2012: now perhaps p ∼ 109.

Even using O(p2.5) bound, coefficients of h need 75 bits.


Computing B0, . . . ,Bp−3 (mod p) — Voronoi identity

Recall Voronoi identity:

B∗nn

=

p−1∑j=1

(−1)j jn−1 (mod p).

Again let’s put j = g i for g a generator of (Z/pZ)∗, and let xi bethe corresponding ±1 sign for each i . Also write ` = n − 1, andb` = B∗n/n. The sum becomes

b` =

p−2∑i=0

g ìxi .

We want to evaluate this for all (odd) 0 ≤ ` < p − 1.

This is precisely a discrete Fourier transform of the sequence(x0, . . . , xp−2) over Fp.



Can’t evaluate the DFT directly over Fp (same reason as before).

Idea: use Bluestein’s trick to convert to polynomial multiplication.

Using the identity ì =(`2

)+(−i2

)−(`−i2

), the DFT

b` =

p−2∑i=0

g ìxi

becomes

g−(`2)b` =

p−2∑i=0

g−(`−i2 )(g(−i

2 )xi

).

The right hand side is now a convolution of g−( i2) and g(−i

2 )xi .



Resulting algorithm involves multiplying two polynomials of degreeabout p/2 with coefficients in Fp.

We run into the same problem as before: for p ∼ 109, the productstill has 75-bit coefficients (even if allow cheating).

I will now sketch how Rader’s trick mitigates this problem. Thisseems to new in the context of computing Bernoulli numbers.


Computing B0, . . . ,Bp−3 (mod p) — Rader’s trick

First, some symmetry implies we only need sum over odd terms:

b` = 2

p−2∑i=0i odd

g ìxi .

For simplicity let us assume that q = (p − 1)/2 is prime.

Then (Z/2qZ)∗ is cyclic of order q − 1.

In fact (Z/2qZ)∗ consists of all odd 0 ≤ i < p − 1, except for q.

Sob` = 2g `qxq + 2

∑i∈(Z/2qZ)∗

g ìxi .



Now h be a generator of (Z/2qZ)∗. Put i = h−s and ` = ht .

Then the sumy` =

∑i∈(Z/2qZ)∗

g ìxi

becomes

yht =

q−2∑s=0

ght−sxh−s .

The right hand side is now a convolution of ghs and xh−s .

But notice that the values xh−s are all ±1!



So we end up with a polynomial multiplication fg , where:

I Both have degree about p/2,

I The coefficients of f are ‘random’ looking integers in [0, p),

I The coefficients of g are all ±1.

Now the coefficients of h = fg are bounded by about p2.

Even for p ∼ 109 this fits into 64 bits.

If we cheat, we get the bound p1.5. For p ∼ 109 this is about 45bits — it even fits into IEEE double-precision floating-point.


Computing B0, . . . ,Bp−3 (mod p) — timings

Historical comparison:

Year Hardware Algorithm p Time

1992 Motorola 68040 series inversion 106 200s2012 Intel ‘Westmere’ (12 cores) Rader’s trick 109 25s

Current plan is to do this for all p < 230 ∼ 109.

Should cost about 1.5 million CPU hours.


Thank you!


Documents

Old and new algorithms for computing Bernoulli numbersweb.maths.unsw.edu.au/~davidharvey/talks/bernoulli.pdf · Old and new algorithms for computing Bernoulli numbers David Harvey