Cooking the Classics

Cooking the ClassicsIAN STEWART

IIn Chapter 17 of Mathematical Carnival [3] MartinGardner tells us that ‘‘When a mathematical puzzle isfound to contain a major flaw—when the answer is

wrong, when there is no answer, or when, contrary toclaims, there is more than one answer or a better answer—the puzzle is said to be ‘cooked’.’’ Gardner gives severalexamples, the simplest being a puzzle he himself set in achildren’s book: in the array of numbers

9 9 95 5 53 3 31 1 1

circle six digits to make the total of circled numbers equal 21.This is impossible on grounds of parity. Gardner’s answer, ineffect cooking his own puzzle, is to turn the page upsidedown and circle the three 6’s and the three 1’s that thenappear. But a reader, Howard Wilkerson, circled each of the3’s, one of the 1’s, and then drew a big circle round the othertwo 1’s (giving 11). This is better, since upside down 3’s and5’s don’t look like digits.

Gardner calls this kind of cook a quibble-cook. It exploitsan imprecise definition of the question to obtain an unex-pected answer. Mathematics is well up to speed on precisedefinitions these days. Even so, themathematics that we teachto our students, indeed tell to each other, is also sometimesopen to cookery—especially some of our classic theorems,which occasionally have become cliches. Over the years, I’vecomplied a mental list of cooked classics—a few contentious,all open to debate, all a matter of taste. I am now taking thedangerous step of committing them to print. At the very least,they might be offered to students as exercises, or for classdiscussion, to avoid giving the impression that the classicproof is holy writ. Many of them have a dynamical systemsflavour, even when the topic is number theory. A few don’t.

I’ll start with a couple of warm-up examples, which willbe familiar to most of you.

Root Two is IrrationalThe classic proof of the irrationality of

ffiffiffi

2p

proceeds bycontradiction, assuming that

ffiffiffi

2p¼ p=q in lowest terms, and

deducing that both p and q must be even. The argumentruns like this: p2 ¼ 2q2 so p must be even, say p = 2k. Butthen 2k2 ¼ q2 so q must be even, which is a contra-diction.

One problem with this proof is that ‘lowest terms’involves existence and uniqueness of prime factorization,but that can be got round by defining ‘lowest terms’ tomean ‘minimise q’.

If you’re prepared to accept existence and uniquenessof prime factorization, then it’s more informative toobserve that p2 ¼ 2q2 has an even power of 2 on the left-hand side, but an odd power on the right. Better still,prove:

THEOREM 0.1 A rational number a is a perfect square if

and only if every prime occurs to an even power in the

factorization of a.

This does involve extending prime factorization torationals, allowing negative powers, but that’s easy andthe proof is trivial. I think this theorem puts the topicinto an appropriate context, and makes the whole ideamuch clearer than a rather artificial argument tailoredspecifically to

ffiffiffi

2p

. Sometimes generalities are better thanexamples.

However, if you don’t want to go that route, primefactorization can be eliminated completely by using what isin effect the original Greek proof, thereby ‘classic-ing thecook’:

Suppose thatffiffiffi

2p¼ p=q where q is as small as possible.

Then p [ q and 2q [ p. Since

2�ffiffiffi

2p

ffiffiffi

2p� 1¼

ffiffiffi

2p

we have

ffiffiffi

2p¼ 2� p=q

p=q � 1¼ 2q � p

p� q

which, since p - q \ q, is a contradiction.

� 2011 Springer Science+Business Media, LLC, Volume 33, Number 1, 2011 61

The GCD Is a Z-Linear CombinationA basic property of the integers is:

THEOREM 0.2 If g = gcd(a, b) then there exist p, q such

that g = pa + qb

One favoured route to this result is to set up the DivisionAlgorithm and the Euclidean Algorithm, and proceedinductively. This gets quite complicated.

An alternative is to use some ring theory, note that Z is aprincipal ideal domain, and consider the ideal generated by{a, b}. But this involves a fair amount of machinery.

However, a bare hands version of the PID proof is quickand simple, and avoids both algorithms:

Let k be the smallest positive integer of the form pa + qb.Clearly g divides k. I claim that k divides a. To see why,choose the largest m such that mk�a. (If you don’t like thisstep, choose the smallest m such that (m + 1)k [ a.) Ifmk = a then 0\ s = a - mk \ k (or else we can replacem by m + 1). Now

k � s ¼ paþ qb� s

¼ paþ qbþ km� a

¼ paþ qb� aþmðpaþ qbÞ¼ p0aþ q0b

for suitable p0, q0, contrary to the definition of k. Thiscontradiction proves that k divides a. Similarly, k divides b.

In fact, we can define the GCD this way, and proveexistence alongside the ‘linear combination’ property.

The GCD as a Dynamical SystemLet’s get more ambitious. I’ve always found the Euclideanalgorithm slightly complicated—not to understand or per-form, but to argue about theoretically. Expressing the GCDas an integer linear combination involves a complicated

induction, working backward through the algorithm, andsomehow the point gets lost.

Now, the place where recursion comes into its own inmathematics is dynamical systems. And it is straightforwardto turn the Euclidean algorithm into a dynamical system, bydoing something slightly more simple-minded. As a bonus,we don’t need the division algorithm.

Define a map / : N� N! N� N by

/ðx; yÞ ¼ ðmaxðx; yÞ �minðx; yÞ;minðx; yÞÞ

I will prove that if (a, b) = (0, 0), then after finitely manysteps the iterates of (a, b) reach a fixed point (d, 0) whered is the GCD of a and b.

First, I establish several simple facts:

1) /(x, y) = (0, 0) if and only if (x, y) = (0, 0).2) Define ||(x, y)|| = x + y. Then jj/ðx; yÞjj � jjðx; yÞjj

with equality if and only if x = 0 or y = 0.3) The fixed points of / are precisely the points (z, 0).4) If (x, y) = (0, 0) then / preserves the GCD. That is,

gcd(/(x, y)) = gcd(x, y).

The proofs are trivial: I give the fourth, which is typical.I claim that z divides both x and y if and only if it divides

the two components of /(x, y). If z|x and z|y thenz|max(x, y) and z|min(x, y). So z divides the two com-ponents of /(x, y). If z| (max (x, y) - min (x, y)) andz|min (x, y) then z|(max (x, y) - min (x, y) + min(x, y)) = max (x, y). Therefore z divides both x and y.

THEOREM 0.3 Let (a, b) = (0, 0). Then there exists t� 1

such that /t(a, b) = (d, 0), and then d = gcd(a, b).

PROOF. The norm ||(x, y)|| is a Liapunov function for /:

that is, it decreases when / is applied—strictly unless x = 0

or y = 0. Since x; y 2 N there must exist some t such that

jj/tþ1ða; bÞjj ¼ jj/tða; bÞjj. Then /t(a, b) = (0, d) or (d, 0)

for some d. Since /(0, d) = (d, 0), we have /t+1(a, b) =

(d, 0).

Since GCD is a conserved quantity for the dynamics,gcd(a, b) = gcd(d, 0) = d.

THEOREM 0.4 If d = gcd(a, b) then there exist p, q such

that d = pa + qb.

PROOF. Let X � N� N consist of all pairs ðp1aþq1b; p2aþ q2bÞ for p1; p2; q1; q2 2 Z. It is trivial to prove that

X is invariant under /: that is, /ðXÞ � X .

Since ða; bÞ 2 X , so is /t(a, b) for all t� 0. So ðd; 0Þ 2 X ,implying that d = pa + qb for some p, q.

This is all rather cute, and it gets cuter. The map / alsohas a scaling property:

/ðka; kbÞ ¼ k/ða; bÞ

for k 2 N. So X is the disjoint union of subsets X k, where

X0 ¼ fð0; 0ÞgX k ¼ fða; bÞ : gcdða; bÞ ¼ kgðk [ 0Þ

.........................................................................

AU

TH

OR IAN STEWART, Emeritus Professor of

Mathematics at Warwick University, is the

author of many research papers and books

for broad audiences. Currently he works on

pattern formation and network dynamics.A Fellow of the Royal Society, his awards

include the Royal Society’s Faraday Medal,

the Gold Medal of the Institute for Mathe-

matics and Its Applications, the Public

Understanding of Science Award of AAAS,

and the Zeeman Medal. His nonmathe-

matical interests include science fiction,

Egyptology, and geology.

Mathematics Institute

University of Warwick

Coventry, CV4 7AL

UK

e-mail: [email protected]

62 THE MATHEMATICAL INTELLIGENCER

Moreover, each Xk is / - invariant, and (aside fromk = 0) the dynamics of / on Xk is conjugate to that of / onX1 via the map ða; bÞ 7! ðka; kbÞ. So we can understand thedynamics of / by restricting attention to X1, the set of allpairs of coprime natural numbers.

Each such pair has a uniquely defined height, which isthe smallest t for which /t(a, b) = (1, 0). And we canclassify pairs by increasing height, using:

LEMMA 0.5 /(a, b) = (c, d) if and only if (a, b) =

(c + d, d) or (a, b) = (d, c + d). That is, /-1(c, d) =

{(c + d, d), (d, c + d)}.

We then find:

Height 0 : (1,0)Height 1 : (0,1)Height 2 : (1,1)Height 3 : (2,1), (1,2)Height 4 : (3,1), (1,3), (3,2), (2,3)Height 5 : (4,1), (1,4), (4,3), (3,4), (5,2), (2,5), (5,3), (3,5)So there are 2n-2 pairs of height n when n� 2.

The matrix of heights is curious:

1 2 3 4 5 6 7 8 92 � 3 � 4 � 5 � 63 3 � 4 4 � 5 5 �4 � 4 � 5 � 5 � 65 4 4 5 � 6 5 5 66 � � � 6 � 7 � �7 5 5 5 5 7 � 8 68 � 5 � 5 � 8 � 99 6 � 6 6 � 6 9 �

Here the entry in row a, column b is the height of (a, b),and we have written � whenever a, b are not coprime.

The reason for the � symbols is that the entire infinitematrix has a recursive structure: the part marked with �symbols repeats the same entries, but the row and columnpositions are scaled. (This is the decomposition into the Xk

mentioned earlier.)The form of this matrix is not obvious, and it could be

worth investigating.The map / is a formal version of the procedure known to

the ancient Greeks as anthyphaeresis (Fowler [2]), in whichsquares are trimmed off a rectangle until the remaining partis too small, at which point smaller squares are trimmed, andso on. It is of course well known that this procedure isequivalent to the Euclidean algorithm, which in turn isequivalent to the continued fraction expansion. The Greekproof that

ffiffiffi

2p

is irrational occupies similar territory.The map / makes sense on R

þ � Rþ, and is also related

to the continued fraction of x/y or y/x. The norm remains aLiapunov function, so there are no periodic points. How-ever, there are periodic points if we also rescale (x, y), andthese occur when x/y is a quadratic irrational.

Euler’s FormulaThe famous formula

eip ¼ �1

is often presented as something of a mystery. We all knowhow to justify it, but the usual approaches lack motivationand present it as some sort of accident. There is at least oneway to make it natural and inevitable, using differentialequations. Quite a bit of machinery needs to be set up to dothis, but it’s all good stuff in its own right.

Consider the ODE

dz

dt¼ iz

in the complex plane, with initial conditions z(0) = 1.The solution is z(t) = eit. (You can define the expo-

nential this way.)Now consider the geometrical meaning of the equation.

Since iz is orthogonal to z, the tangent vector to a solutionat a point z(t) is at right angles to the radius vector from 0 toz(t). It follows (and can be checked by a simple calcula-tion—convert to polars) that

• The unit circle centre 0 through z(t) is invariant under theflow (dr/dt = 0)

• In polar coordinates, dh/dt = 1.

Therefore the unit circle is the trajectory of the solutionwhen z(0) = 1, and t is arc length in radians, so the pointz(t) moves at uniform speed along the circle.

The Greek definition of p tells us that the circumferenceof the circle is 2p, so halfway round occurs when t = p. Buthalfway round is the point z = -1. Hence eip = -1.

All the ingredients of this proof are well known andform part of various standard approaches to the trigono-metric and exponential functions. But the overall packageseems not to get much prominence. Its big advantage is toexplain why circles (the definition of p) have anything todo with the exponential.

Infinitude of PrimesEuclid’s proof that the number of primes exceeds any finitebound is wonderfully clever, but I’ve always felt that con-sidering p1 . . . pk þ 1 is something of a rabbit out of a hat.What follows pretty much explains how the rabbit got intothe hat.

As a warm-up, suppose that the only primes were 2, 3,and 5. Then a systematic list of all products of powers ofthese would yield all possible numbers. The list is mosteasily generated in non-numerical order, something like1; 2; 3; 5; 2:2; 2:3; 2:5; 3:3; 3:5; 5:5; � � �.

We want to prove that something is missing, and a goodway to do that is to count how many numbers this processyields, up to some limit N.

The number 1 occurs once.

Multiples of 2 occur bN2c times.



However, we are overcounting since (for instance)multiples of 6 are multiples of 2 and multiples of 3. So we


must subtract multiples of 2.3, which occur b N2:3c times. But

then... Well, you can see what’s coming. By the inclusion-exclusion principle the number of numbers from 1 toN (which of course is N) has to be

1þ bN2c þ bN

3cbN

5c � b N

2:3c � b N

2:5c � b N

3:5c þ b N

2:3:5c

exactly.Oh, but those floor functions are a pain. So let’s make

N be exactly divisible by all the denominators; that is, N is amultiple of 2.3.5. Better still, why not set N = 2.3.5 = 30?Then the expression becomes

1þ 15þ 10þ 6� 2� 3� 5þ 1 ¼ 23

which isn’t 30.The generalization is now obvious. Suppose that p1; . . .;

pk is the entire (finite) list of primes. Let N ¼ p1. . .pk. Thesame argument using the inclusion-exclusion principlenow implies that the number of numbers between 1 and N,namely N, satisfies

N ¼ 1þX

i

N

pi�X

i;j

N

pipjþX

i;j;k

N

pipjpk� . . .

where the subscripts in all sums are unequal. Therefore

1 ¼ N 1�X

i

1

piþX

i;j

1

pipj�X

i;j;k

1

pipjpkþ . . .

!

¼ N 1� 1

p1

� �

. . . 1� 1

pk

� �

¼ p1 1� 1

p1

� �

. . .pk 1� 1

pk

� �

¼ ðp1 � 1Þ. . .ðpk � 1Þwhich is absurd unless k = 1 and p1 = 2. But now 3 ismissing from the list of primes.

Having now realized that some numbers are missing, wequickly notice that an obvious missing number is N � 1 ¼p1 . . . pk � 1. Having seen that, we reconstruct Euclid’sproof when we decide that a remainder of 1 is easier toexplain than a remainder of -1.

There are, of course, innumerable proofs of the infinitudeof primes. A very simple one, related to the above calcula-tion, is to compute the Euler /-function of N ¼ p1 . . . pk.Supposing that every number between 2 and N - 1 is amultiple of some pi, clearly /(N) = 1. On the other hand,

/ðN Þ ¼ /ðp1Þ. . ./ðpkÞ ¼ ðp1 � 1Þ. . .ðpk � 1Þ

So all pi = 2, and k = 1, as before.

Solution of Polynomials by RadicalsIt might seem unlikely that such a tried and tested area asthe solution of the quadratic, cubic, and quartic could haveanything new to offer. However, some minor variations onthe traditional themes are possible.

Quadratics

As a warm-up, I solve the general quadratic

x2 þ px þ q ¼ 0 ð0:1Þ

by an unorthodox method. The idea is that when it comes tothe crunch, the only quadratic we can factorise is x2 � k2. Sowe seek a, b such that

x2 þ px þ q ¼ ðx þ aÞ2 � b2

This leads directly to

2a ¼ p

a2 � b2 ¼ q

Therefore a = p/2, so b2 ¼ p2=4� q, so b ¼ �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

p2=4� qp

.Now (0.1) becomes

0 ¼ ðx þ aÞ2 � b2 ¼ ðx þ aþ bÞðx þ a� bÞ

so x = -a ± b. That is,

x ¼ � p

2�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

p2=4� qp

which is the traditional formula. (What else?)

Quartics

If a trick works, use it again. Thus emboldened, I attemptthe general quartic

x4 þ px3 þ qx2 þ rx þ s ¼ 0 ð0:2Þ

by the same method. We seek a, b, c, d such that

x4 þ px3 þ qx2 þ rx þ s ¼ ðx2 þ ax þ bÞ2 � ðcx þ dÞ2

which leads to

2a ¼ p

a2 þ 2b� c2 ¼ q

2ab� 2cd ¼ r

b2 � d2 ¼ s

Clearly we must set a = p/2. Set b ¼ ðq þ c2 � p2=4Þ=2to solve the second equation for b in terms of c, and

d ¼ �r þ 2ab

2c¼ �r þ pðq þ c2 � p2=4Þ=2

2c

¼ �2r þ pðq þ c2 � p2=4Þ4c

to solve the third equation for d in terms of c. Finally,substitute all of this into the fourth equation:

ððq þ c2 � p2=4Þ=2Þ2 ��

�2r þ pðq þ c2 � p2=4Þ4c

�2

� s

¼ 0

and (with hindsight) multiply through by c2 to removedenominators. This yields

0 ¼ 1

4c6 þ q

2� 3p2

16

� �

c4 þ 3p4

64� p2

4qþ q2

4þ pr

4� s

� �

c2

þ � p6

256þ p4q

32� p2

16q2� p3r

16þ pqr

4� r2

4

� �

which (miraculously) is a cubic in c2. (It is of course avariant of Ferrari’s resolvent cubic.)


Therefore we can solve quartics provided we can solvecubics.

Cubics

The same trick seems not to work, partly because 3 is odd.A variant succeeds in reducing the cubic to... the samecubic. The classic trick is to reduce the equation to

x3 þmx þ n ¼ 0

by translating x and then making a clever substitution. As avariation, don’t bother: consider

x3 þ px2 þ qx þ r ¼ 0

and substitute

x ¼ az þ bþ cz�1

(For motivation, consider the traditional x = t + 1/t forpalindromic polynomials.) Then

x3 þ px2 þ qx þ r ¼ Az3 þ Bz2 þ Cz þ D þ Ez�1

þ Fz�2 þ Gz�3

where

A ¼ a3

B ¼ a2ð3bþ pÞC ¼ að3b2 þ 3acþ 2bpþ qÞD ¼ b3 þ 6abcþ b2pþ 2acpþ bq þ r

E ¼ cð3b2 þ 3acþ 2bpþ qÞF ¼ c2ð3bþ pÞG ¼ c3

A fortuitous coincidence now smacks us between the eyes.Making two expressions 3b + p and 3b2 + 3ac + 2bp + qvanish causes the four coefficients of z2; z; z�1; z�2 tovanish. What luck! This happens when

a ¼ p2 � 3q

9cb ¼ � p

3

and c is a free parameter. (All we need is 9ac = p2 - 3q.)Now the cubic becomes

Az3 þ D þ Gz�3 ¼ 0

so

Az6 þ Dz3 þ G ¼ 0

which is quadratic in z3 so can be solved by radicals. Thensubstitute to get x.

Quintics

Thanks to Abel, Galois, and their predecessors, we knowthere isn’t a formula. I’ve concocted a stripped-down proofusing very little technical machinery, but it’s about ten pageslong. It’s not so much a cook as an attempt to reverse-engineer what the algebraists from Legendre to Kroneckeralready knew, and reassemble the bits that are needed. Itproves that x5 - 80x + 30 = 0 can’t be solved by radicals. Imay publish it elsewhere, once it’s been polished up.

Trisection of AnglesThe usual proof that angles cannot be trisected (see forexample [6]) relies on a cubic equation for cosð2p=9Þ andthe multiplicativity of the degree of a field extension. Here’san alternative using less machinery and a more naturalsetting.

Identify Euclid’s plane with the complex plane C.Define z 2 C to be constructible if it can be con-structed from Q � R � C by ruler and compass. (Notethat we don’t consider the real and imaginary partsseparately.)

The usual coordinate calculations prove:

LEMMA 0.6 A point z is constructible if and only if there

is a finite sequence of complex numbers a1; . . .; ak such that

a21 2 Q, a2

j 2 Qða1; . . .; aj�1Þ for j ¼ 2; . . .; k, and z 2Qða1; . . .; akÞ.

Here, as usual, Qð. . .Þ denoted the subfield of C gen-erated by the contents of the parentheses.

Observe that if K is a subfield of C and a2 [ K, then

KðaÞ ¼ fx þ ay : x; y 2 Kg

I now prove:

THEOREM 0.7 The primitive 9th root of unity f¼ e2pi=9 is

not constructible.

Since f trisects x = e2pi/3 it follows that the angle 2p/3cannot be trisected using ruler and compass.

It remains to prove the theorem without using multi-plicativity of the degree of a field extension.

PROOF. Assume for a contradiction that f is constructible.

Define a tower of subfields

Q ¼ K0 QðxÞ ¼ K1 K2 � � � Ks

such that f 2 Ks and Kj ¼ Kj�1ðajÞ where a2j 2 Kj�1, for

j ¼ 2; . . .; s. Note that the same goes for j = 1 since x ¼ð1þ i

ffiffiffi

3pÞ=2 so QðxÞ ¼ Qð

ffiffiffi

3pÞ. Such a tower exists if and

only if f is constructible. Choose one for which s is minimal.Then

f ¼ aþ bffiffiffi

bp

where a, b, b [ Ks-1. Minimality implies that b = 0,whence also a = 0. But f3 ¼ x, so

x ¼ ðaþ bffiffiffi

bp

Þ3 ¼ ða3 þ 3ab2bÞ þ ð3a2bþ b3bÞffiffiffi

bp

If 3a2bþ b3b 6¼ 0 then

ffiffiffi

bp

¼ x� a3 � 3ab2b3a2bþ b3b

which lies in Ks-1, contrary to minimality. Therefore3a2bþ b3b ¼ 0, so

x ¼ a3 þ 3ab2b


But now

ða� bffiffiffi

bp

Þ3 ¼ ða3 þ 3ab2bÞ � ð3a2bþ b3bÞffiffiffi

bp

¼ ða3 þ 3ab2bÞ ¼ x

The cube roots of x are f;xf;x2f. Therefore

a� bffiffiffi

bp

¼ xcf

where c = 1 or c = 2. (We can’t have c = 0 since b = 0.)But we already know that

aþ bffiffiffi

bp

¼ f

Adding, we get

f ¼ bð1þ xcÞ=2a

which is in Ks-1, a contradiction.

Two Squares TheoremFermat’s Two Squares Therorem, proved by Euler, statesthat any prime of the form 4k + 1 is a sum of two squares.There’s more, but this is the hard part. The traditionalapproaches either use quadratic residues or prove thatsome multiple of the prime is a sum of two squares and usedescent.

The following proof must be well known to number-theorists, but I’ve not seen it in the texts. It is short, con-ceptual, and straightforward.

Recall that the Gaussian integers Z½i comprise allcomplex numbers a + bi, where a; b 2 Z. There is a norm

N ðaþ biÞ ¼ a2 þ b2

and this is multiplicative:

N ðxyÞ ¼ N ðxÞN ðyÞ

The Gaussian integers form a unique factorisation domain:in fact the norm provides a Euclidean algorithm and thiscan be proved quickly by elementary means.

Let p 2 Z be prime (in the usual sense). We claim that ifp : 1 (mod 4) then p is not prime in Z½i. For a contra-diction, suppose that p = 4k + 1 is a Gaussian prime. ThenZ½i=p is a field. It has a subfield Z=p, which does notcontain i, since if it did, i would be real, indeed in Z. Themultiplicative group of this subfield is cyclic of order 4k sohas an element a of order 4. Now the quartic polynomialt4 - 1 has at least five distinct zeros: 1; a; a2; a3, and i. Thisis a contradiction.

Now N(p) = p2, so multiplicativity of the norm impliesthat p ¼ q1q2 where q1; q2 are prime in Z½i. Since p is real,q2 ¼ q1. Let q = a + bi. Then

p ¼ ðaþ biÞða� biÞ ¼ a2 þ b2

Strictly speaking, we get this up to a unit, but the unitsare ±1, ±i. Since p and a2 þ b2 are real and positive, theequation follows.

A tactical variant is to observe that Z½i=p isZ=p½t=ht2 þ 1i, and t2 + 1 is reducible (with a zero a) sothe quotient cannot be a field. This is marginally moreelegant but slightly less direct.

There is a nice analogue for Z½x where x is a cube rootof unity, and now we prove that primes 6k + 1 are of theform a2 þ b2 � ab, or equivalently a2 þ 3b2.

Of course quadratic reciprocity gives far more—but thatdoesn’t count as ‘elementary’.

‘Give Me a Place to Stand, and I Will Movethe Earth’So, famously, said Archimedes. I claim he already had aplace to stand. This is a quibble-cook, I think. Archimedesdidn’t get anything wrong. Just missed the point.

Archimedes was dramatizing the law of the lever, andwhat he had in mind was basically Figure 1. I don’t think hewas interested in the position of the Earth in space, but hewanted the pivot point to be fixed, and in order to applythe law of the lever he needed uniform gravity, contrary toastronomical fact. He also needed a perfectly rigid lever ofzero mass.

No matter. I don’t want to get into discussions aboutinertia or other quibbles. Let’s grant him all those things. Myquestion is: when the Earth moves, how far does it move?

Assume Archimedes can exert a force sufficient to lift hisown weight, say 100kg. The mass of the Earth is about6.1024kg. If the pivot is 1 metre from the Earth then the Law ofthe Lever tells us that distance from thepivot toArchimedes is6 9 1022 metres, and his lever is 1 + 6 9 1022 metres long. IfArchimedes moves his end of the lever one metre, similartriangles tell us that the Earth moves 1.6 9 10-23 metres. Aproton has diameter 10-15 metres...

OK, but it still moves, dammit!True. But suppose that instead of this huge and improb-

able apparatus, Archimedes standson the surfaceof theEarthand jumps. For every metre he clears, the Earth moves1.6 9 10-23 metres the other way (action/reaction). Basi-cally, this has exactly the same effect as a lever 1 + 6 9 1022

metres long—about 1.6 million light years, or about twothirds of the way to the Andromeda Galaxy.

The Reals are UncountableI like the ‘diagonal’ proof, but it does need some intricatemaneuvers with infinite decimals.

Suppose R is countable with R ¼ faj : j ¼ 1; 2; . . .g.Define two functions R! R:

f ðxÞ ¼ 1 8x 2 R

gðxÞ ¼1 if jx � aj j � 2�j for some j

0 otherwise

�

Figure 1. Archimedes’s lever.


Then f(x) = g(x) for all x 2 R. But

Z 1

�1f ¼ 1

Z 1

�1g�

X

1

j¼1

2�j ¼ 1

Of course we need the Lebesgue integral to make thiswork, and for that we need Lebesgue measure on R. Nowthe cook gets cooked, because all we have to do is provethat a countable subset of R has measure zero.

So here’s a topological proof using less machinery. Withthe same assumptions, choose a sequence of non-emptyclosed intervals Aj � ½0; 1 such that aj 62 Aj and Ajþ1 � Aj .(This is easy.) Then A1 \ A2 \ A3 \ � � � is non-empty (bycompactness of [0,1]) but contains none of the aj.

Squares and RectanglesThe next item for cooking is a problem discussed by Ter-ence Tao in his book on problem-solving [7]. My treatmentis not suitable for his intended audience. Still, here goes.

The problem is about four rectangles, of equal area, thatfit together to form a big rectangle as in Figure 2, leaving arectangular hole (shaded). The problem is to prove that if theouter rectangle is actually square, then so is the shaded hole.Tao assumes that the outer square has side 1, and homes inon a key fact that cracks the problem wide open: the sidesx, y of each smaller rectangle must sum to 1. But how toprove this? He studies how the equal-area condition propa-gates information from each rectangle to the next, points outthat four such steps must lead back to the original rectangle,does some calculations, and out pops x + y = 1.

I’m a dynamical systems person, and yet again this lookssuspiciously like a dynamical systems problem to me. Thefunction that maps each rectangle to the next has a period-4point, and we have to deduce that this is really a fixed-point.(All four rectangles have to be congruent if the constructionis to work with a surrounding square; this is not immediatelyapparent from Tao’s treatment. The condition x + y = 1turns out to be irrelevant, though true.) Anyway, when Ipursued this line of attack, it led to a continued fraction andsome simple one-dimensional dynamics, like so:

Suppose that the surrounding square has unit side, and letthe sides of the first rectangle be x0; y0, which lie in (0,1) to

2R

1R

3R

0R

xy−y1

−y1

y

x

Figure 2. Configuration of four squares.

avoid trivial cases. Let the commonarea of the four rectanglesbe A. Then 4A \ 1 if the picture is to be believed, otherwisethe rectangles would overlap or the shaded region wouldhave zero area. So A\ 1

4.Following Tao, we observe that

ðx1; y1Þ ¼�

1� y0;A

1� y0

�

where A ¼ x0y0, and in general

ðxiþ1; yiþ1Þ ¼�

1� yi;A

1� yi

�

for all i because area is preserved. Already we see a discretedynamical system on R

2. Moreover, ðx0; y0Þ is a period-4point, otherwise the rectangles would not fit togethercorrectly.

Although this is a 2-dimensional system, it reduces toone dimension because xiyi ¼ A. So yi ¼ A=xi, and whatmatters is the function / defined by

/ðxÞ ¼ 1� A

x

which determines the trajectory of the x-coordinate.The 4-periodicity tells us that /4ðx0Þ ¼ x0. It is therefore

natural to work out what /4(x) is. Being lazy, and prone tocomputational errors, I did this using Mathematica, and theresult was what always happens to me when I use Math-ematica:

/ðxÞ ¼ 1� A

x

/2ðxÞ ¼ 1� A

1� Ax

/3ðxÞ ¼ 1� A

1� A1�A

x

/4ðxÞ ¼ 1� A

1� A1� A

1�Ax

I always forget that you have to tell Mathematica to simplifyand expand expressions the way a human mathematicianwould. Here it was just substituting them unchanged, andgetting needlessly complicated. I wanted to set x = /4(x)and solve, but I also wanted to keep track of what was goingon, so I tried to get the machine to simplify the expression—which never works the way I expect it to, somehow. Theresult was uninformative in any case.

I then realised that Mathematica was telling me some-thing, which I wouldn’t have noticed if the software hadhelpfully simplified the formula as it went along. You don’thave to be a genius to see that we are developing a con-tinued fraction, whose terms visibly have period one, notfour. It is obvious what all iterates of / look like. Moreover,if the limit of this continued fraction is z, then clearlyz ¼ 1� A

z, so z is actually a fixed point of /. In a little moredetail:

LEMMA 0.8 If the sequence of all iterates of a periodic

point converges, then it must be a fixed-point.


PROOF. Let x0 be periodic with period p. We know that

/n(x0) converges, and its limit n satifies /(n) = n. But the

subsequence /np(x0) is constant, with every term equal to

x0. So x0 = n and is a fixed-point.

But now we are done aside from some routine checking.If x is a period-4 point of /, then z = x, so x is a fixed-point. Therefore all four rectangles are congruent, theentire figure has rotational symmetry through p/2, and theshaded part is square.

Rigour demands a little care with convergence. Thecontinued fraction is not ‘regular’, with 1 on top in place ofA, so maybe it doesn’t converge. Actually, it does. Thisfollows easily from a few simple features of the dynamics of/, illustrated in Figure 3.

The fixed-points of / are the solutions of the quadraticx2 - x + A = 0, namely

k ¼ �1�ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1� 4Ap

2

l ¼ �1þffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1� 4Ap

2

Bearing in mind that A\ 14, these solutions are real. It is

routine to establish the inequalities

0\A\k\ffiffiffiffi

Ap

\l\1

The graph of / on [0,1] looks like Figure 3.There is a geometric symmetry in the problem: the two

sides x0; y0 of R0 are interchangeable. The symmetry isnonlinear, since it maps x to A/x. We use it in a trivial way:without loss of generality we can choose x0 [

ffiffiffiffi

Ap

, that is,make it the longer side of R0. The map / preserves thisproperty. Moreover, if

ffiffiffiffi

Ap

\x\l, then x \/(x)\ l, andif l\ x \ 1, then l\/(x)\ x. (These facts are clear fromthe figure, and can be checked by routine calculations.)By monotonicity, the sequence /n(x) converges for allx 2 ð

ffiffiffiffi

Ap

; 1Þ. Since the limit is a fixed-point of /, it must be l.Now we can appeal to Lemma 0.8 and we are done.

Note that x + y = 1 plays no role in this approach. Wegain something by this method, too. Suppose we start withany initial rectangle, taking x0 [

ffiffiffiffi

Ap

, and go round andround the square forming new rectangles. Although in

general this sequence does not have period 4, it convergestoward a rectangle of period 4, indeed period 1.

Courant-Robbins TrainThis is metacookery, perhaps meta-metacookery. The issuesare complicated and what gets cooked is not the obviousstatement. It can be uncooked, but this seems like luck morethan judgement, and doesn’t happen in very similarquestions.

I mention this one because it opens up an importantgeneral issue. Whenever I try to explain what’s wrong,someone always writes in and complains that with a bit ofextra work the conclusion remains correct. That’s true (andthis remark goes right back to Poston [4] when he pointedout the difficulty in 1976) but the whole point is that youneed the extra work, contrary to what Courant and Robbinssay. Indeed, if I wanted to be pedantic I could point outthat they don’t state the precise hypotheses needed for theirargument, so it’s not clear how generally they think itapplies.

Courant and Robbins [1] state the problem like this:Suppose a train travels from station A to station B alonga straight section of track. The journey need not be ofuniform speed or acceleration... But the exact motion ofthe train is supposed to be known in advance. On thefloor... a rod is pivoted so that it may move withoutfriction either forward or backward until it touches thefloor. If it touches the floor, we assume that it remainson the floor henceforth; this will be the case if the roddoes not bounce. Is it possible to place the rod in such aposition that, if it is released at the instant when the trainstarts and allowed to move solely under the influence ofgravity and the motion of the train, it will not fall to thefloor during the entire journey from A to B ?The answer they give is ‘yes’, and the reasoning is

continuity:No detailed knowledge of the laws of dynamics isneeded; only the following assumption of a physicalnature need be granted: The motion of the rod dependscontinuously on its initial position.Figure 4 is basically theirs, with added variables and

wheels removed. To spell out the proof: starting at angle 0,the rod stays there; the same goes for angle p. So [0, p]maps continuously to [0, p], so the image contains allpossible angles between 0 and p.

Actually, the important lesson in this example is thatboundary conditions can destroy ‘intuitive’ continuity prop-erties. As it happens, they don’t—in the simplest model forthis problem. But they do in slightly more complicated (andmore realistic) models, and in very similar problems. Theoffhand reference to an ‘assumption of a physical nature’could easily lead readers to think that the assumption isharmless, and would apply in all similar problems. That’snot so.

How justified, then, is the continuity assumption? Whatdo they mean when they state that ‘the motion of the rod’ isa continuous function of the initial position? There are atleast three meanings: the entire trajectory (continuity insome function space), its location after some fixed timeFigure 3. Graph of / showing fixed-points.


before imposing the absorbing boundary conditions, itslocation after some fixed time after imposing the absorbingboundary conditions. These could have different continuityproperties.

If the rod is free to rotate to any angle in the circle,there’s no great difficulty—and the property of continuityholds for any ODE with well-behaved solutions for all time.Singularities can make the question ill-posed, but theydon’t occur here.

However, as Poston [4] pointed out, the ‘absorbingboundary conditions’ at angles 0, p are more problematic.In fact, for a wide class of ODEs, the boundary conditionsdestroy continuity, as we’ll shortly see.

So the train problem is ‘cooked’– not because Courantand Robbins gave the wrong answer; not even becausethey gave the wrong reason for it. It is cooked because theymade no effort at all to justify the reason, citing physicalintuition. Never forget that ODEs and PDEs have boundaryconditions. And those may have a dramatic effect on con-tinuity properties of solutions.

Suppose, for instance, that Courant and Robbins hadmodified the problem to allow the wind to blow, with aprespecified velocity, depending smoothly on time. Or (seebelow) allowed the train’s floor not to be flat—which, inci-dentally, they don’t specify, although their picture points thatway. Most readers would probably have accepted the same‘physical assumption’, but this time it would be plain wrong.Poston remarks that the only way he can see to salvage theargument is to impose some stringent conditions: perfectlylevel track, no springs in the wheels of the train... Then youstill have to explain why those conditions do the trick. This isnontrivial, and can’t just be dismissed as a simple propertybased on physical considerations.

The problem is worth analysing in detail.Consider a smooth vector field on a circle, depending

smoothly on time t 2 R. Then the associated flow w deter-mines diffeomorphisms wt of the circle for all t. So the time-t flow, for any given t, is smooth. For any given positionfunction F(t) for the train, the time taken to go from A to B isdetermined. So the map from initial state to final state, for theposition h of the rod, is a smooth diffeomorphism.

Suppose, however, that the flow looks like Figure 5,which is entirely reasonable for a general smooth ODE. Ifwe now impose the absorbing boundary conditions, wefind that all initial conditions lead, after finite time, to statesh = 0, p. That is, the rod hits the floor.

As I said, I’m less interested in developing conditions onthe mechanics that ensure such things cannot happen, thanin observing that in general ODEs they do happen, robustly.That alone makes the continuity assumption far from

obvious. Let’s seewhy,whichwill also explain the conditionsunder which Courant and Robbins are correct aboutcontinuity.

Figure 4 shows about the simplest model of the trainthat I can invent. The train itself is reduced to a point A thatmoves along the horizontal line; its location is x relative tothe origin. The rod is inclined at angle h, and we assume itsmass m is concentrated at the end. To remove variousconstants, choose units to make the length of the rod equalto 1, the mass m = 1, and the acceleration due to gravityg = 1.

Assume that the position of the train at time t is a pre-specified function F(t), which we take to be of class C2 toavoid analytic issues. Take coordinates in a moving frameattached to point A. This introduces a ‘fictitious force’� €FðtÞ, and aside from this we now have a simple pendu-lum. The angular position h satisifes the ODE

€h ¼ €FðtÞ sin h� cos h

and we take initial conditions h ¼ h0; _h ¼ 0 at time t = 0.(I use _h ¼ 0 because Courant and Robbins say ‘released’.It’s not essential.) The absorbing boundary conditions candestroy continuity if there exists h0 for which the trajectoryis tangent to the boundary h = 0, while h� 0 locally alongthe trajectory, as in Figure 6(b). Then we would expect tobe able to arrange F(t) to make nearby h0 hit the boundarytransversely or miss it altogether as in Figures 6 (a, c).Similar problems arise at the other boundary h = p if h�plocally.

The tangency condition on the boundary at 0 impliesthat h = 0 and _h ¼ 0. But when h = 0, the equation ofmotion implies that €h ¼ �1, so locally h� 0. Similarly, atthe p boundary, €h ¼ 1. So the problematic kind of ‘grazing’trajectory cannot occur.

Even now, it is a nontrivial exercise to prove that the finalposition is a continuous function of the initial one when theabsorbing boundary conditions hold. There might be othersources of discontinuity, for all we know. So technicallyCourant and Robbins are absolutely right, because theymake continuity an explicit assumption. But continuityfails if we make apparently harmless modifications to thequestion.

One of the simplest such modifications is to place theboundaries at p/4, 3p/4. Most readers would still be happy toaccept the continuity assumption. However, if we takeF(t) = t4/12 so that €FðtÞ ¼ t2, which is not exactly rocketscience, then numerical experiments find a grazing trajectorywith initial conditions close to h0 = 1.0664. So the continuity

θ

t

0

π

α

Figure 5. Why absorbing boundary conditions can destroy

continuity.

mgx θ

Α

(X,Y )

Figure 4. The Courant-Robbins train.


argument might break down. Further experiments indicatethat it does, and there is no intermediate position that staysupright (Figure 6).

For the next experiment, put the boundaries at p/50,49p/50. Again we find numerical evidence for a grazingtrajectory, and there is no intermediate position that staysupright beyond about 6 seconds (Figure 7).

These results accord with physical intuition, of course.With a perfectly horizontal track, no applied accelerationcan lift the rod off the floor if it is instantaneously at rest ath = 0, and the same goes for h = p. But if the rod is slightlyabove the horizontal position, a suitable acceleration couldlift it. This is why pretty much the only boundaries that

produce continuity are the ones employed by Courant andRobbins. Simple estimates prove:

THEOREM 0.9 Suppose 0 \ a\p and the boundaries are

at a, p - a. Suppose €FðtÞ ¼ t2. Then the rod hits the floor

after a finite time T(a).

PROOF. Let vðtÞ ¼ _hðtÞ be the angular velocity. Then the

ODE reduces to the system

_hðtÞ ¼ vðtÞ_vðtÞ ¼ t2 sin hðtÞ � cos hðtÞ

Figure 6. Four trajectories. Initial conditions respectively are h(0) = 0.2, 1.0664, 1.1, 1.3.

Figure 7. Four trajectories. Initial conditions respectively are h(0) = 0.9, 0.943974 , 1, 1.1.


If t [ 2 then

_vðtÞ� ð22 � 1Þ sin a ¼ 3 sin a

Therefore

vð2þ tÞ� 1

provided

V þ 3t sin a� 1

where

V ¼ minfvð2Þ : hð0Þ 2 ½a; p� a; vð0Þ ¼ 0g

Therefore we must take

t� 1� V

3 sin a

Now, if _hðtÞ� 1 then hðt þ p� 2aÞ�p� a, so the rod hashit the boundary. Thus the rod hits the boundary after atmost time

T ðaÞ ¼ p� 2aþ 2þ 1� V

3 sin a

It remains to prove that V is finite. (It might be -?.) But

_vðtÞ� �cos a

so

vðtÞ� �t cos a

and

vð2Þ� �2 cos a

Therefore V � �2 cos a. Thus we may take

T ðaÞ ¼ p� 2aþ 2þ 1þ 2 cos a3 sin a

Since we are taking F(t) = t4/12, the stations can beplaced at A = 0, B = T4(a)/12. Note that T ðaÞ ! 1 as a! 0.

Poston [4] says: ‘Given the usual laws, the only physicalassumptions I can find which guarantee a nonfalling historyare that the pivot is perfect with the movement of the trainperfectly, totally horizontal. (Not just level track: the trainmust have no springs.)’ And he adds: ‘Courant and Robbinsdid not make a silly mistake, but Dynamical Systems hasprogressed... it would be silly now.’

Pouring from a Toroidal BottleI wonder if something similar is going on in a provocativearticle by Sarkaria [5] published in the Intelligencer in 2001.This discusses the hypothesis of continuity in the motion ofmatter, tracing it back to Anaxagoras, and demonstrates that‘matter and motion cannot both be assumed continuous’.The example cited is the impossibility of emptying ‘a tyre-tube filled with water into a bucket in any finite length oftime’. The proof is that a homotopically nontrival loop in thetyre would flow to define a homotopically trivial loop in thebucket. That is, the fundamental group p1 is an obstacle toemptying the bottle. Sarkaria observes that a similar

problem still arises if distinct fluid particles are allowed tooccupy the same location. The usual assumption of fluiddynamics, that particles don’t do that, implies that the flowof a body of fluid preserves its topological type.

We can cook this elegant but slightly artificial example byusing the invariant p0 instead—the space of connectedcomponents. Now there is amore commonplace, but equallydramatic scenario: the same appeal to continuity implies thatyou can’t empty a normal-shaped wine bottle into two ormore glasses. Initially the wine forms one connected com-ponent, and a continuous map can’t increase the number ofcomponents.

Once again, though, we must ask: is continuity of the flowjustified? It surely depends on the boundary conditions, notjust the PDE. Tyres and bottles have boundaries, and physi-cally sensible boundary conditions could–perhaps should—allow the fluid to ‘break’when its surfacemeets theboundaryin the right way, perhaps tangentially. Without this boundarycondition, the flow would be continuous, but with it, thefluid can split discontinuously.

I don’t know enough about the Navier-Stokes equationto analyse this possibility, which is a question for experts; Iimagine that the answer depends on the precise physicalassumptions encoded in the PDE and its boundaryconditions.

Afterword

As I said at the start, this is a personal compilation and I’mnot making any claims of originality or superiority. Varia-tions on proofs tend to be rediscovered over and overagain, and I’m sure that historians will shortly be telling methat the approach to cubics was known to some Portuguesealgebraist in 1573, whereas experts in number theory willquietly draw me aside to explain that Serre knew every-thing in this article in the 1950s. Still, I think it’s a usefulexercise to find alternatives to the classic proofs, some ofwhich have become cliches and some of which do make abit of a mouthful of things that can be done more clearlyand more simply. I’d be interested to see other exampleswhere the classics can be cooked. Maybe it would be worthsetting up a website on mathematical cookery.

REFERENCES

[1] H. Courant and H. Robbins. What is Mathematics?, Oxford:

Oxford University Press, 1941.

[2] D.H. Fowler. the Mathematics of Plato’s Academy, Oxford:

Oxford University Press, 1987.

[3] M. Gardner. Mathematical Carnival, New York: Knopf, 1975.

[4] T. Poston. Au Courant with differential equations, Manifold 18

(1976) 6–9.

[5] K.S. Sarkaria. A topological paradox of motion, Mathematical

Intelligencer 23 vol. 4 (2001) 66–68.

[6] I. Stewart. Galois Theory, Boca Raton: Chapman and Hall/CRC,

2004.

[7] T. Tao. Solving Mathematical Problems — a Personal Perspec-

tive, Oxford: Oxford University Press, 2006.


Documents

Cooking the Classics