Dirichlet’s Theorem on Arithmetic Progressions€™s Theorem on Arithmetic Progressions 1 The inﬁnitude of primes Prime numbers are the building blocks of all counting numbers:

Dirichlet’s Theorem

on Arithmetic Progressions

1 The infinitude of primes

Prime numbers are the building blocks of all counting numbers: the fundamental theorem of arith-metic states that every positive integer can be written as a unique product of prime numbers, up tothe order of the prime factors. Thus prime numbers are of great importance, especially in numbertheory, and for century after century, mathematicians have worked to understand prime numbersbetter.

One of the questions that occurred to early mathematicians was: how many primes are there?There are infinitely many of them, as first shown by Euclid in his book of Elements. This proof isusually found in books in a form restated in the modern language of mathematics; I assume thatthe reader has seen the proof in that form. In Euclid’s time, numbers were thought of as lengthsof line segments, so I shall state the proof in the language closer to Euclid’s original one.

Theorem 1.1. The number of prime numbers is more than any assigned multitude of prime num-bers.

A

B

C

r rr rr rr r rE D F

Euclid’s argument. Say a number (i.e. a line segment of length) X is measured by another numberY if X is proportional to Y . Let A, B, and C be the assigned prime numbers. Take the leastnumber DE measured by A, B, and C. Add the unit length segment DF to DE. Then EF iseither prime or not prime.

If it is prime, then there are now the prime numbers A, B, C and EF , more in number thanthe assigned number three, that is, A, B and C.

If it is not prime, then it is measured by some prime number. Let it be measured by the primenumber G. Since A, B and C measure DE, G also measures DE. But it also measures EF , soG measures the remaining segment, the unit-length DF , which is absurd. So G is not one of the

1

numbers A,B or C. Since G is prime, we now have the prime numbers A, B, C and G, again morein number than the assigned number three.

We can apply this to any number, not just three, so there are more prime numbers than anyassigned multitude of prime numbers.

When mathematicians became more familiar with the notion of infinity, they then began to ask:how infinite is this infinite number of primes? For example, they would consider an infinite sequenceof positive integers X = (x0, x1, x2, . . .) as, in some sense, “bigger” (more dense in distribution, forexample) than another infinite sequence of positive integers Y = (y0, y1, y2, . . .) if

∑i x−1i diverges

but∑

i y−1i converges. When Euler gave another proof that there are infinitely many primes, he

also showed us that there are indeed quite a lot of primes:

Theorem 1.2. The sum of the reciprocals of the primes diverges, i.e.∑p

1p

diverges.

We shall defer the proof until later.After establishing that there is an infinite number of primes, people began to ask about the

infinitude of primes in another way: If we are given any arithmetic sequence, do we also have aninfinite number of primes in the sequence?

Euler first stated that if the first term of an arithmetic sequence is 1, then the sequence containsan infinite number of primes. Now consider a general arithmetic sequence. If the first term andthe common difference have a common factor k that is not equal to 1, then clearly every numberin the sequence would be divisible by k, so this must be excluded for the question to have a chanceof an answer in the affirmative. With this case excluded, the statement in the question was firstconjectured to be true by Gauss. It turns out that if this case is excluded, then indeed there is aninfinite number of primes in the sequence. This was first proved by Dirichlet in 1835, and is nowknown as Dirichlet’s theorem on arithmetic progressions:

Theorem 1.3. If a is coprime to d, then the arithmetic progression a, a + d, a + 2d, . . . containsan infinite number of primes.

Put another way, we may say that if two numbers a and b are coprime, then there are an infinitenumber of primes of the form an+ b, where n > 0.

There is no simple elementary argument known that proves this theorem in full generality.However, for a few simple cases, we may mimic Euclid’s proof for the infinitude of primes in orderto write out a proof for these cases.

Theorem 1.4. There exist an infinite number of primes of the form 6n+ 5.

Proof. Suppose that there are finitely many primes of the form 6n + 5. Call them p1, p2, . . . , pk.Let

N := 6(p1p2 . . . pk)− 1.

Then N ≡ 5 (mod 6). Since for any m and n we have

(6n+ 1)(6m+ 1) ≡ 1 (mod 6)

2

(6n+ 1)(6m+ 3) ≡ 3 (mod 6)

and (6n+ 3)(6m+ 3) ≡ 3 (mod 6),

N must have at least one prime factor q such that q ≡ 5 (mod 6). But q cannot be one ofp1, p2, . . . , pk, as otherwise

q | N and q | 6p1p2 . . . pk,

which implies that q | (6p1p2 . . . pk −N), i.e. q | 1, which is impossible. So we found another primeof the form 6n+ 5, a contradiction. Hence there are infinitely many primes of the form 6n+ 5.

Utilising the Legendre symbol and / or quadratic reciprocity, we can also prove some of theother cases of the theorem, still mimicking Euclid’s idea. We shall demonstrate with one using theproperties of Legendre symbol alone, and another that also uses quadratic reciprocity.

Theorem 1.5. There exist an infinite number of primes of the form 8n− 1.

Proof. Suppose that there are finitely many primes of the form 8n − 1. Call them p1, p2, . . . , pk.Let

N := 4(p1p2 . . . pk)2 − 2.

Since N/2 is odd, N must be divisible by some odd prime q, so

2 ≡ 4(p1p2 . . . pk)2 (mod q).

Hence we have(

2q

)= 1. Since for an odd prime p we have

(2p

)= (−1)(p

2−1)/8, we can concludethat q ≡ 1 (mod 8) or q ≡ 7 (mod 8). Since N/2 is odd, N/2 is the product of all the odd primefactors of N . But N/2 ≡ −1 (mod 8), so there must be at least one such factor of the form 8n− 1,as otherwise N/2 ≡ 1 (mod 8). Let this factor be r. But r cannot be one of p1, p2, . . . , pk, asotherwise

r | N and r | 4(p1p2 . . . pk)2,

which implies that r | 2, which is impossible as r is odd. So we found another prime of the form8n− 1, a contradiction. Hence there are infinitely many primes of the form 8n− 1.

Theorem 1.6. There exist an infinite number of primes of the form 6n+ 1.

Proof. Suppose that there are finitely many primes of the form 6n + 1. Call them p1, p2, . . . , pk.Let

N := 12(p1p2 . . . pk)2 + 1.

Then N ≡ 1 (mod 6). Suppose q is a prime factor dividing N. Then since N := 12(p1p2 . . . pk)2 +1,provided that q 6= 3 we have a solution to the congruence

3x2 + 1 ≡ 0 (mod p),

i.e. (3x)2 ≡ −3 (mod p), giving(−3p

)= 1. Using

(−1p

)= (−1)(p−1)/2 and quadratic reciprocity,

we find that (−3p

)=(−1p

)(3p

)= (−1)(p−1)/2(−1)

p−12

3−12

(p3

)=(p

3

)so in fact

(p3

)= 1. This means that p ≡ 1 (mod 3). But p is odd, so p ≡ 1 (mod 6). As before, p

cannot be one of p1, p2, . . . , pk, so we have our contradiction.

3

However, there is no known elementary proof similar to the above ones for the general case.To prove the general theorem, instead of following Euclid’s idea, Dirichlet followed (and we shallfollow) Euler’s idea, that is, by proving that the sum of the reciprocals of such sequences diverges.

For the remainder of this essay, the main focus will be to prove Dirichlet’s theorem, and weshall develop ideas to the extent that they are needed in achieving our goal.

To prepare ourselves for investigation into the qualitative properties of the infinitude of primes,we first consider arithmetic functions.

2 Arithmetic functions

Definition 2.1. An arithmetic function or a number-theoretic function is a function f(n) from thepositive integers to the complex numbers. That is, it is a sequence of complex numbers.

Definition 2.2. An arithmetic function is called multiplicative if f(mn) = f(m)f(n) for all pairsof coprime m, n. A function is called completely multiplicative if f(mn) = f(m)f(n) for all m, n.

We first consider two important arithmetic functions: the Mobius function and the Euler totientfunction. Recall that a square-free integer is one not divisible by any perfect square other than 1.

Definition 2.3. The Mobius function is the function µ : Z+ → {−1, 0, 1} defined by

µ(n) :=

1 if n = 1 or if n is square-free with an even number of distinct prime factors;−1 if n is square-free with an odd number of distinct prime factors;0 if n is not square-free.

Notice that the Mobius function is multiplicative.

Theorem 2.4. Suppose n ≥ 1. Then

∑d|n

µ(d) =

{1 if n = 10 if n > 1.

That is, ∑d|n

µ(d) =⌊

1n

⌋.

Proof. This is clearly true for n = 1. Now assume that n > 1. Write n = pα11 pα2

2 . . . pαkk . µ(d) is

zero if d has a square in its factors, so the sum is comprised of terms where d = 1 and where d is adivisor of n that is a product of distinct primes. So we have∑

d|n

µ(d) = µ(1) + µ(p1) + · · ·+ µ(pk) + µ(p1p2) + · · ·+ µ(pk−1pk) + · · ·+ µ(p1 . . . pk)

= 1 +(k

1

)(−1) +

(k

2

)(−1)2 + · · ·+

(k

k

)(−1)k

= (1− 1)k

= 0.

4

Definition 2.5. The Euler totient function is the function ϕ : Z+ → Z+ defined as the number ofpositive integers not exceeding n that are coprime to n.

Theorem 2.6. Let n ≥ 1. Then ∑d|n

ϕ(d) = n.

Proof. Define S := {1, 2, . . . , n}, A(d) = {k : gcd(k, n) = d, 0 < k ≤ n}. S is thus a disjoint unionof A(d), so ∑

d|n

|A(d)| = n.

Now gcd(k, n) = d ⇐⇒ gcd(k/d, n/d) = 1, and 0 < k ≤ n ⇐⇒ 0 < k/d ≤ n/d. So if we letq = k/d, then q satisfies gcd(q, n/d) = 1, 0 < q ≤ n/d. There are ϕ(n/d) numbers q satisfying this,and since the two conditions correspond exactly to that for the sets A(d),∑

d|n

ϕ(nd

)= n.

Since∑d|n

ϕ(nd

)=∑d|n

ϕ(n), the result follows.

We now establish a link between the Mobius function and the Euler totient function:

Theorem 2.7. Let n ≥ 1. Thenϕ(n) =

∑d|n

µ(d)n

d.

Proof.

ϕ(n) =n∑

k=1

⌊1

gcd(k, n)

⌋

=n∑

k=1

∑d|gcd(k,n)

µ(d) by (2.4)

=n∑

k=1

∑d|kd|n

µ(d)

=∑d|n

n/d∑q=1

µ(d) where again q = k/d

=∑d|n

µ(d)n

d.

Theorem 2.8. With the empty product equal to 1, we have

ϕ(n) = n∏p|n

(1− 1

p

).

5

Proof. Write n = pα11 pα2

2 . . . pαkk . Applying the previous theorem,

ϕ(n) = n∑d|n

µ(d)d

= n∑

d|pα11 ...p

αkk

µ(d)d

= n∑

d|p1...pk

µ(d)d

since µ(d) = 0 if d is not square-free

= n

1 +∑

i

µ(pi)pi

+∑i,j

µ(pipj)pipj

+ · · · µ(p1 . . . pk)p1 . . . pk

= n

1−∑

i

1pi

+∑i,j

1pipj

− · · ·+ (−1)k

p1 . . . pk

= n

∏p|n

(1− 1

p

).

3 Euler’s argument for the infinitude of primes

In this section, we shall first work towards proving Theorem 1.2. Recall that Theorem 1.2 statesthat the sum of the reciprocals of all the primes diverges.

Definition 3.1. For s > 1, we define the Riemann zeta function to be

ζ(s) =∞∑

n=1

1ns

.

The variable is conventionally denoted s in deference to Riemann’s 1859 paper that foundedthe study of this function as a complex function, now famous for the Riemann Hypothesis statedin it. The integral test quickly establishes the convergence of the series for s > 1 and divergenceotherwise.

Far before Riemann’s time, however, this function was studied by Euler as a real function. Eulerwas the first to compute exact values of ζ(s) for s = 2, as well as numerous other even-numbervalues of s. He also deduced a product formula for ζ(s), the generalised version of which now bearshis name.

Theorem 3.2 (Euler Product). If f(n) is multiplicative, then∞∑

n=1

f(n) =∏

p prime

(1 + f(p) + f(p2) + · · · ).

Proof. Expand partial products on the right hand side, and apply the fundamental theorem ofarithmetic: ∏

p<y

(1 + f(p) + f(p2) + · · · ) =∑k1

f(pk11 )∑k2

f(pk22 ) · · ·

∑kt

f(pktt )

6

=∑

k1,k2,...,kt

f(pk11 )f(pk2

2 ) · · · f(pktt )

=∑

k1,k2,...,kt

f(pk11 p

k22 · · · pkt

t )

=∑

n : P (n)<y

f(n)

where p1, p2, . . . , pt are the primes less than y and P (n) is the largest prime factor of n. Everypositive integer less than y has no factors more than y, so∣∣∣∣∣∣

∞∑n=1

f(n)−∑

n : P (n)<y

f(n)

∣∣∣∣∣∣ ≤∞∑

n=y

|f(n)|.

The right hand side tends to zero as y →∞, so the result follows.

Corollary 3.3. If f(n) is completely multiplicative, then

∞∑n=1

f(n) =∏

p prime

11− f(p)

.

Proof. Since f(n) is completely multiplicative, f(pk) = f(p)k. Hence by the previous theorem,

∞∑n=1

f(n) =∏

p prime

(1 + f(p) + f(p2) + · · · )

=∏

p prime

(1 + f(p) + f(p)2 + · · · )

=∏

p prime

11− f(p)

.

Corollary 3.4. For s > 1,

ζ(s) =∏

p prime

11− p−s

.

Proof. Apply the above corollary to the function f(n) := n−s, which is obviously completelymultiplicative.

We can now prove Theorem 1.2, which we restate here:

Theorem 3.5. The sum of the reciprocals of the primes diverges, i.e.∑p

1p

diverges.

Euler himself wrote this argument:

7

Euler’s argument. Euler manipulated the harmonic series formally. Using the Euler product for-mula,

log

( ∞∑n=1

1n

)= log

(∏p

11− p−1

)

=∑

p

− log(1− p−1)

=∑

p

(1p

+1

2p2+

13p3

+ · · ·)

=

(∑p

1p

)+∑

p

1p2

(12

+13p

+1

4p2+ · · ·

)

<

(∑p

1p

)+∑

p

1p2

(1 +

1p

+1p2

+ · · ·)

=

(∑p

1p

)+

(∑p

1p(p− 1)

)

<

(∑p

1p

)+ C

for some constant C, finite because∑∞

n=11n2 is finite. Since

∑nk=1

1k is asymptotic to log n, Euler

made the conclusion that

12

+13

+15

+17

+111

+ · · · = log log(+∞).

This is not rigorous by modern standards, but can be modified to be so. One obvious way tointerpret the last “equation” is to treat it as meaning that the partial sum of the reciprocals ofprimes is asymptotic to log log n. This is indeed the case. We build on Euler’s idea to construct aproof that also gives an estimate for the sum:

Proof of 3.5. Firstly, every positive integer i can be written as a product of a squarefree integerand a square. Suppose

i = aib2i ,

where ai ≤ i is squarefree, and bi ≤ i. Then

n∑i=1

1i

=n∑

i=1

1ai

1b2i

≤n∑

i=1

1ai

n∑k=1

1k2

≤∏p≤n

(1 +

1p

) n∑k=1

1k2

since ai are squarefree.

8

Secondly, by the integral definition of the natural logarithm,

log n <n∑

i=1

1i.

Thirdly, for all x > 0,

1 + x < 1 + x+x2

2!+ · · · =

∞∑n=1

xn

n!= ex.

And the sum∞∑

k=1

1k2

= ζ(2)

is finite. Combining everything above,

log n <n∑

i=1

1i≤∏p≤n

(1 +

1p

) n∑k=1

1k2

< ζ(2)∏p≤n

e1/p = ζ(2) exp

∑p≤n

1p

,

and then taking logarithm of both sides,

log log n− log ζ(2) <∑p≤n

1p.

So the partial sum grows at least like log log n.

Remark 3.6. The Meissel-Mertens constant, or just Mertens constant, M , is defined to bethe limit of the difference between the sum of prime reciprocals and log log n:

M := limn→∞

∑p≤n

1p− log log n

= γ +∑

p

[log(

1− 1p

)+

1p

].

And so we have proved that the sum of the reciprocals of all the primes diverges. To adapt theproof to any arithmetic progression with coprime leading term and common difference, Dirichletintroduced Dirichlet characters and Dirichlet series.

4 Characters and their properties

We shall use functions called Dirichlet characters, which are completely multiplicative functionsto which Theorem 3.3 can be applied. Before we look into Dirichlet characters, we first considercharacters and character groups.

Definition 4.1. Let G be a group. A character of G is a function f : G → C× satisfyingf(g1g2) = f(g1)f(g2) for all g1, g2 ∈ G. That is, f is a group homomorphism.

This is now a special case of the group of representations of a group, for the case where the repre-sentations are one-dimensional. Such characters considered by Dirichlet are one of the “ancestors”of group representation and character theory.

We now derive some properties of characters of finite groups. So from now on, let the groupsconcerned be finite groups.

9

Proposition 4.2. Let G be a group with identity e. Then f(e) = 1.

Proof.f(e) = f(e2) = f(e)2 ,

whence f(e) = 1.

Corollary 4.3. Let G be a group of order n. Then for each g ∈ G, f(g) is an n-th root of unity.

Proof.f(g)n = f(gn) = f(e) = 1.

Proposition 4.4. If G is an abelian group of order n, then there are n characters.

Proof. Since G is abelian, using the structure theorem, we may write G as

G ∼= Cpe11⊗ Cp

e22⊗ · · · ⊗ Cp

ekk

where each pi is prime, and Cr is a cyclic group of order r. If g ∈ G, then g = gα11 gα2

2 . . . gαkk , where

gi generates Cpeii

and the product of the orders of each gi is n. Thus

f(g) =∏

i

f(gi)αi .

The characters depend on the value of each generator gi, and since the generators’ orders multiplyto n, there are at most n different characters.

Now suppose we are given w1, w2, . . . , wk such that wpeii

i = 1 for each i. We can then setf(gi) = wi, and construct a character by defining

f(g) =∏

i

f(gi)αi .

It can be routinely checked that each such construction is a character and is distinct from all theothers, so there are at least n different characters.

Thus there are in fact precisely n different characters for G.

So suppose that the set of characters of G is {f1, f2, . . . , fn}.

Definition 4.5. The principal character, f1, is the character such that f1(g) = 1 for all g ∈ G.

Theorem 4.6. The characters of a group G form an abelian group, with the group operation definedby

(fifj)(g) = fi(g)fj(g)

for all g ∈ G, with identity element f1 and inverse of f being f , where “ ” denotes complexconjugation (i.e. the inverse, since f(g) is a root of unity).

Proof. This routine checking will be skipped.

Definition 4.7. The group of characters in Theorem 4.6 is called the character group of G. Weshall denote this group by G. (In algebra language, this is the dual group of the abelian group G.)

10

Theorem 4.8 (Orthogonality relations). This is a pair of similar assertions.

(i) For a fixed character f ∈ G,

∑g∈G

f(g) =

{n if f = f1

0 otherwise.

(ii) For a fixed element g ∈ G, ∑f∈G

f(g) =

{n if g = e

0 otherwise.

Proof. (i) If f = f1 then ∑g∈G

f(g) =∑g∈G

1 = n.

Otherwise, there must exist h ∈ G such that f(h) 6= 1. Then

f(h)∑g∈G

f(g) =∑g∈G

f(hg) =∑g∈G

f(g).

Since f(h) 6= 1,∑

g∈G f(g) = 0.

(ii) If g = e then ∑f∈G

f(g) =∑f∈G

1 = n.

Otherwise, there must exist a character f0 such that f0(g) 6= 1. (If g = gα11 . . . gαk

k , then sinceg 6= e, for some j we have αj 6≡ 0 (mod ej). Then set χ(gj) = e2πi/ej , χ(gi) = 1 for all otheri.) Then as in (i), we have

f0(g)∑f∈G

f(g) =∑f∈G

(f0f)(g) =∑f∈G

f(g).

Since f0(g) 6= 1,∑

f∈G f(g) = 0.

Remark 4.9. Theorem 4.8 is called the orthogonality relations, because if one replaced f in (i) byfifj and g in (ii) by g1g2, we obtain

(i)1|G|

∑g∈G

fi(g)fj(g) =

{1 if fi = fj

0 otherwise.

(ii)1|G|

∑f∈G

f(g1)f(g2) =

{1 if g1 = g2

0 otherwise.

They are the first and second orthogonality relations for group characters.

11

Now let n be the residue class modulo k of an integer n, i.e. the set of all integers congruent ton modulo k. (In other terms, n = n+ kZ ∈ Z/kZ.) The set of reduced residue classes modulo k is(n1, n2, . . . , nϕ(k)), where each ni is coprime to k and ϕ(n) is the Euler totient function.

Theorem 4.10. The set of reduced residue classes modulo k forms an abelian group of order ϕ(k),with the group operation defined by mn = mn. The identity of the group is 1 and the inverse of anelement m is n where n satisfies mn ≡ 1 (mod k). We denote this group by Z∗k.Definition 4.11. A Dirichlet character modulo k is a function χ : Z+ → C defined to be

χ(n) :=

{f(n) if n, k are coprime (i.e. n ∈ Z∗k)0 otherwise,

where f is a character of the group Z∗k.The principal Dirichlet character modulo k, χ1, is where f = f1 above, i.e.

χ1(n) :=

{1 if n, k are coprime (i.e. n ∈ Z∗k)0 otherwise.

Theorem 4.12. A Dirichlet character is a completely multiplicative function.

Proof. Suppose n, m ∈ Z+.

Case (i) If n and m are both coprime to k, then

χ(n)χ(m) = f(n)f(m) = f(nm) = f(nm) = χ(nm).

Case (ii) If n is not coprime to k, then nm is also not coprime to k, so

χ(n)χ(m) = 0 · χ(m) = 0 = χ(nm).

In both cases, we have χ(n)χ(m) = χ(nm), so χ is completely multiplicative. For completeness,we state this again in the list of properties of Dirichlet characters below.

Theorem 4.13 (Properties of Dirichlet characters). Dirichlet characters modulo k have thefollowing properties:

(i) χ(1 + km) = 1 for all positive integers m;

(ii) χ(n) is a ϕ(k)-th root of unity if gcd(n, k) = 1;

(iii) χ(n)χ(m) = χ(nm) for all positive integers n, m;

(iv) There are precisely ϕ(k) Dirichlet characters modulo k;

(v) For a fixed Dirichlet character χ,k∑

n=1

χ(n) =

{ϕ(k) if χ = χ1

0 otherwise;

(vi) For a fixed positive integer n, ∑χ

χ(n) =

{ϕ(k) if n = 10 otherwise.

Proof. These properties directly follow from the definition of a Dirichlet character and the propertiesof a group character.

12

5 Dirichlet series

We now consider Dirichlet series, the analogue of the sum of the reciprocals of primes in Euler’sproof for the infinitude of primes. Dirichlet series have numerous properties analogous to those forpower series. We shall first define a Dirichlet series and investigate its properties.

Definition 5.1. Let s = σ + it ∈ C. A Dirichlet series is a series of the form

f(s) =∞∑

n=1

a(n)ns

.

Observe that if σ ≥ σ0, then |ns| ≥ nσ0 , so∣∣∣∣a(n)ns

∣∣∣∣ ≤ |a(n)|nσ0

.

By the comparison test, we see that if the series converges absolutely for s0 = σ0 + it0, then itconverges absolutely for all s = σ + it with σ ≥ σ0. Using this observation, we have the following:

Theorem 5.2 (Absolute convergence of Dirichlet series). Suppose that the series∑∞

n=1 |a(n)n−s|does not converge or diverge for all s. Then there exists a real number σa, the abscissa of abso-lute convergence, such that

∑∞n=1 a(n)n−s converges absolutely for σ > σa and does not converge

absolutely for σ < σa.

Proof. Let L be the subset of real numbers such that if σ ∈ L, then∑∞

n=1 |a(n)n−s| diverges. Byassumption, the series does not converge for all s, so L 6= ∅. The series also does not diverge forall s, and by our previous observation, L must be bounded above. Since L is a subset of the realnumbers, L has a least upper bound. Set σa to be this least upper bound.

Suppose σ < σa. Then σ ∈ L, because otherwise σ would be an upper bound smaller than theleast upper bound, σa.

Suppose σ > σa. Then σ /∈ L, because σ is an upper bound for L.

We shall also need to know where the series converges, not just absolutely. First we need alemma, which we then apply to the situation we need to obtain the corollary that follows.

Lemma 5.3. Let x ≥ 1 and let φ(x) be continuously differentiable on [1,∞). Let S(x) =∑n≤xC(n), C(n) ∈ C. Then

∑n≤x

C(n)φ(n) = S(x)φ(x)−∫ x

1S(t)φ′(t)dt.

Proof. First notice that if x is such that k ≤ x < k+ 1 for some integer k, then S(x) = S(k). Thiswill also be subsequently used to put the sum inside an integral with integration limits being twoconsecutive integers.

Let us set the empty sum S(0) to be 0. Starting from the left hand side,∑n≤x

C(n)φ(n) =∑n≤k

C(n)φ(n)

13

=k∑

n=1

(S(n)− S(n− 1))φ(n)

=k∑

n=1

S(n)φ(n)−k∑

n=1

S(n− 1)φ(n)

=k∑

n=1

S(n)φ(n)−k−1∑n=1

S(n)φ(n+ 1)

=k−1∑n=1

S(n)(φ(n)− φ(n+ 1)) + S(k)φ(k)

=k−1∑n=1

S(n)(φ(n)− φ(n+ 1)) + S(k)(φ(k)− φ(x)) + S(k)φ(x)

= −k−1∑n=1

S(n)∫ n+1

nφ′(t)dt− S(k)

∫ x

kφ′(t)dt+ S(x)φ(x)

= S(x)φ(x)−k−1∑n=1

∫ n+1

nS(t)φ′(t)dt−

∫ x

kS(t)φ′(t)dt

= S(x)φ(x)−∫ k−1

1S(t)φ′(t)dt−

∫ x

kS(t)φ′(t)dt

= S(x)φ(x)−∫ x

1S(t)φ′(t)dt.

Corollary 5.4. Let x, y ≥ 1 and for some fixed s, s0 ∈ C, s− s0 6= 1, let φ(x) := x−(s−s0). Then∑x<n≤y

a(n)ns0

φ(n) =∑

x<n≤y

a(n)ns0

φ(y)−∫ y

x

∑x<n≤t

a(n)ns0

φ′(t)dt.

Since φ′(x) = −(s− s0)x−1−(s−s0), equivalently we write∑x<n≤y

a(n)ns

=∑

x<n≤y

a(n)ns0

· 1ys−s0

+ (s− s0)∫ y

x

∑x<n≤t

a(n)ns0

1t(s−s0)+1

dt.

Proof. Applying Lemma 5.3 with

S(x) =∑n≤x

a(n)ns0

and φ(x) = x−(s−s0),

we obtain∑x<n≤y

a(n)ns

=∑n≤y

a(n)ns

−∑n≤x

a(n)ns

=∑n≤y

a(n)ns0

· 1ns−s0

−∑n≤x

a(n)ns0

· 1ns−s0

14

= S(y)1

ys−s0+ (s− s0)

∫ y

1S(t)

1t(s−s0)+1

dt

− S(x)1

xs−s0− (s− s0)

∫ x

1S(t)

1t(s−s0)+1

dt

= (S(y)− S(x))1

ys−s0+ S(x)

(1

ys−s0− 1xs−s0

)+ (s− s0)

∫ y

xS(t)

1t(s−s0)+1

dt

= (S(y)− S(x))1

ys−s0− (s− s0)

∫ y

xS(x)

1t(s−s0)+1

dt+ (s− s0)∫ y

xS(t)

1t(s−s0)+1

dt

= (S(y)− S(x))1

ys−s0+ (s− s0)

∫ y

x(S(t)− S(x))

1t(s−s0)+1

dt

=∑

x<n≤y

a(n)ns0

· 1ys−s0

+ (s− s0)∫ y

x

∑x<n≤t

a(n)ns0

1t(s−s0)+1

dt.

Theorem 5.5 (Conditional convergence of Dirichlet series). For each Dirichlet series, thereis an abscissa of (conditional) convergence σc ∈ [−∞,∞] such that the series converges (condition-ally) for all s with σ > σc, and diverges for all s with σ < σc.

Proof. Suppose

f(s) =∞∑

n=1

a(n)ns

converges for s = s0. We wish to show that f(s) then converges for all s with σ > σ0.We use the fact that over the real numbers, a sequence is a Cauchy sequence if and only if

it converges. Let each term in our sequence be defined by am :=∑

n≤m a(n)n−s0 . Thus ourassumption that f(s0) converges means that for every ε > 0 there is a number x0 such that|ax − ay| < ε for all x, y > x0, that is,∣∣∣∣∣∣

∑x<n≤y

a(n)ns0

∣∣∣∣∣∣ < ε for all x, y > x0.

Now suppose ε > 0 is given, and consider the sequence defined by bm :=∑

n≤m a(n)n−s. Applying5.4, we have

|bx − by| =

∣∣∣∣∣∣∑

x<n≤y

a(n)ns

∣∣∣∣∣∣< ε · 1

yσ−σ0+ ε|s− s0|

∫ y

x

1t(σ−σ0)+1

dt

= ε · 1yσ−σ0

+ ε|s− s0|σ − σ0

(1

xσ−σ0− 1yσ−σ0

)

≤ ε · |s− s0|σ − σ0

1yσ−σ0

+ ε|s− s0|σ − σ0

(1

xσ−σ0− 1yσ−σ0

)since |s− s0| ≥ σ − σ0

= ε · |s− s0|σ − σ0

1xσ−σ0

15

≤ ε · |s− s0|σ − σ0

. (∗)

Since ε was arbitrary, the series converges. We set

σc := inf{Re s :∞∑

n=1

a(n)ns

converges}.

Theorem 5.6. Suppose that f(s) =∑∞

n=1 a(n)n−s converges at s = s0. Then for every δ > 0, theseries converges uniformly in the sector U := {s : arg |s− s0| < π

2 − δ}.

Proof. Let s be in U . By elementary trigonometry,

|s− s0||σ − σ0|

=(σ − σ0)/ cos arg |s− s0|

σ − σ0

<1

cos(π/2− δ)since s ∈ U and cosine decreases on [0, π/2)

=1

sin δ,

hence by (∗) we have ∣∣∣∣∣∣∑

x<n≤y

a(n)ns

∣∣∣∣∣∣ < ε

sin δ,

independent of s, and thus convergence is uniform for each s ∈ U .

Theorem 5.7. A Dirichlet series

f(s) =∞∑

n=1

a(n)ns

with abscissa of convergence σc is analytic in the half-plane {s : Re s > σc}.

Proof. Since each term in the series is analytic and the series converges uniformly in each U foreach given δ with s0 = σc + it0 specified above, by Weierstrass Theorem, f(s) is analytic in U . Theregion U is U = {s : arg |s− s0| < π

2 − δ} = {s = σ+ it : σ > σc}, so f(s) is analytic in the requiredhalf-plane.

For a power series, there is a singularity on its circle of convergence. For a Dirichlet series, theanalogue only holds if each coefficient is non-negative.

Theorem 5.8 (Landau’s theorem). Suppose that

f(s) =∞∑

n=1

a(n)ns

is a Dirichlet series with a(n) ≥ 0 for each n, and that the abscissa of convergence σc is finite.Then f(s) has a singularity at s = σc.

16

Proof. Suppose that, in fact, f(s) is analytic at s = σc. Then there exists a δ > 0 such that f(s) isanalytic in D1 := {s : |s− σc| < δ}. Now let σ0 be a point on the real axis such that σ0 > σc andsuch that D2 := {s : |s − σ0| < ε} for some ε > 0 is totally contained in D1, with some points inD2 on the real axis less than σc. Hence f(s) is analytic in D2 and we may expand f(s) about σ0

to get

f(s) =∞∑

k=0

f (k)(σ0)k!

(s− σ0)k.

We also have

f(s) =∞∑

n=1

a(n)ns

and since this is uniformly convergent in the region concerned, we may differentiate term by termand obtain

f (k)(σ0) =∞∑

n=1

(−1)k(log n)ka(n)nσ0

,

and substituting into the power series expansion, we get

f(s) =∞∑

k=0

(σ0 − s)k

k!

∞∑n=1

(log n)ka(n)nσ0

.

Now take s real, σ0 − ε < s = σ < σ0, so that σ0 − s = σ0 − σ > 0:

f(σ) =∞∑

k=0

(σ0 − σ)k

k!

∞∑n=1

(log n)ka(n)nσ0

=∞∑

n=1

a(n)nσ0

∞∑k=0

(σ0 − σ)k(log n)k

k!

(changing order of summation since every term is positive)

=∞∑

n=1

a(n)nσ0

exp((σ0 − σ) log n)

=∞∑

n=1

a(n)nσ0

1nσ−σ0

=∞∑

n=1

a(n)nσ

.

Hence f(s) converges for all σ on the specified interval. But for some σ in the specified interval,σ < σc. This contradicts the fact that σc was the abscissa of convergence.

Theorem 5.9. Given a Dirichlet series f(s) =∑∞

n=1 a(n)n−s, set

S(n) =n∑

k=1

a(k).

If S(n) = O(nβ) for some β > 0, then the abscissa of convergence σc for f(s) is less than or equalto β.

17

Proof. Suppose M > N . We then have

M∑n=1

a(n)ns

−N−1∑n=1

a(n)ns

=M∑

n=N

a(n)ns

=M∑

n=N

S(n)− S(n− 1)ns

=M∑

n=N

S(n)ns

−M∑

n=N

S(n− 1)ns

=M∑

n=N

S(n)(

1ns

− 1(n+ 1)s

)− S(N − 1)

N s+

S(M)(M + 1)s

=M∑

n=N

S(n) · s∫ n+1

n

1xs+1

dx− S(N − 1)N s

+S(M)

(M + 1)s

= sM∑

n=N

∫ n+1

n

S(x)xs+1

dx− S(N − 1)N s

+S(M)

(M + 1)ssince S(x) = S(n) for x ∈ [n, n+ 1)

= s

∫ M+1

N

S(x)xs+1

dx− S(N − 1)N s

+S(M)

(M + 1)s

= O

(|s|∫ M+1

N

1xσ+1−β

dx

)+O

(1

Nσ−β

)+O

(1

Mσ−β

)since by assumption S(n) = O(nβ)

= O

(|s|(M + 1)β−σ −Nβ−σ

σ − β

)+O(Nβ−σ) +O(Mβ−σ)

which, if σ > β, approaches zero as N →∞, as required.

Lemma 5.10 (Euler-Maclaurin formula). Suppose f(x) is a function that is continuous anddifferentiable for n ≤ x ≤ m, where n, m are positive integers. Then

m∑k=n

f(k) =f(n) + f(m)

2+∫ m

nf(x) dx+

∫ m

nf ′(x)

(x− bxc − 1

2

)dx.

Proof.∫ m

nf ′(x)

(x− bxc − 1

2

)dx

=m−1∑k=n

∫ k+1

kf ′(x)

(x− k − 1

2

)dx

18

=

[m−1∑k=n

∫ k+1

kxf ′(x) dx

]−

m−1∑k=n

k

∫ k+1

kf ′(x) dx− 1

2

m−1∑k=n

∫ k+1

kf ′(x) dx

∗=

[m−1∑k=n

((k + 1)f(k + 1)− kf(k))−m−1∑k=n

∫ k+1

kf(x) dx

]−

m−1∑k=n

k(f(k + 1)− f(k))− 12(f(m)− f(n))

=m−1∑k=n

f(k + 1)−∫ m

nf(x) dx− 1

2(f(m)− f(n))

=m∑

k=n

f(k)−∫ m

nf(x) dx− f(n)− 1

2(f(m)− f(n))

=m∑

k=n

f(k)−∫ m

nf(x) dx− 1

2(f(m) + f(n)),

where at (∗) we have used integration by parts on the expression in square brackets.

We previously defined the Riemann zeta function for real s > 1 as

ζ(s) =∞∑

n=1

1ns.

By Theorem 5.2, we now know that the series for ζ(s) converges absolutely for all s = σ + it withσ > 1.

Theorem 5.11. The Riemann zeta function, as defined above for σ > 1, can be extended analyti-cally to an analytic function in the half plane σ > 0, except for a simple pole at s = 1 with residue1.

Proof. We apply the Euler-Maclaurin formula above with f(x) = x−s where s 6= 1. We get

m∑k=n

1ks

=n−s +m−s

2+∫ m

nx−s dx− s

∫ m

nx−s−1

(x− bxc − 1

2

)dx. (†)

Since x− bxc − 12 < 1, the last integral above gives∫ m

nx−s−1

(x− bxc − 1

2

)dx = O

(∫ m

nx−σ−1 dx

)

= O

(1σ

(1nσ

− 1mσ

))

= O

(1σnσ

),

so if σ > 0, this converges absolutely and uniformly as m→∞. So let m→∞ in (†) to obtain

∞∑k=n

1ks

=n−s

2+∫ ∞

nx−s dx− s

∫ ∞

nx−s−1

(x− bxc − 1

2

)dx

19

=1

2ns− n1−s

1− s− s

∫ ∞

nx−s−1

(x− bxc − 1

2

)dx.

Letting n = 1, we get

ζ(s) =∞∑

k=1

1ks

=12

+1

s− 1− s

∫ ∞

1x−s−1

(x− bxc − 1

2

)dx.

Since the last integral converges uniformly, by Weierstrass theorem it is an analytic function forσ > 0. Thus we have extended ζ(s) to σ > 0, and it is clear that ζ(s), so extended, has a simplepole at s = 1 with residue 1.

To finish this section, we shall prove a theorem for the multiplication of two Dirichlet series,which will be needed for our final proof of the Dirichlet’s theorem.

Theorem 5.12 (Multiplication theorem for Dirichlet series). Given two Dirichlet series

f(s) =∞∑

n=1

a(n)ns

g(s) =∞∑

n=1

b(n)ns

that converge absolutely for σ > σ0, then we have for σ > σ0,

f(s)g(s) =∞∑

n=1

c(n)ns

wherec(n) =

∑jk=n

a(j)b(k) =∑k|n

a(k)b(nk

).

Proof. For σ > σ0, by assumption the two series f(s) and g(s) converge absolutely. Hence multi-plying the series and rearranging the terms,

f(s)g(s) =∞∑

j=1

a(j)js

∞∑k=1

b(k)ks

=∞∑

j=1

∞∑k=1

a(j)b(k)(jk)s

.

Letting n = jk,

f(s)g(s) =∞∑

n=1

1ns

∑jk=n

a(j)b(k) =∞∑

n=1

c(n)ns

.

6 Dirichlet L-series

Definition 6.1. The Dirichlet L-series is defined to be

L(s, χ) =∞∑

n=1

χ(n)ns

.

Since |χ(n)| = 1, for real σ,∞∑

n=1

∣∣∣∣χ(n)nσ

∣∣∣∣ = ∞∑n=1

1nσ,

so by Theorem 5.2, the series converges absolutely for s with σ > 1.Firstly, we consider the series for non-principal characters.

20

Theorem 6.2. If χ 6= χ1, then the series for L(s, χ) has abscissa of convergence σc = 0.

Proof. Since χ 6= χ1, by the properties of Dirichlet characters (Theorem 4.13), we have∑k

n=1 χ(n) =0. Hence ∣∣∣∣∣

y∑n=1

χ(n)

∣∣∣∣∣ ≤ k

for any positive integer y. So the difference of partial sums yields∣∣∣∣∣y∑

n=x

χ(n)ns

∣∣∣∣∣ ≤ 1|xs|

∣∣∣∣∣y∑

n=x

χ(n)

∣∣∣∣∣≤ k

xσ

which goes to 0 as x→∞ if σ > 0. So the series has abscissa of convergence at most 0 by Theorem5.5. But if σ < 0, each term in the series does not tend to 0, so the series diverges for σ < 0. ByTheorem 5.5 again, the abscissa of convergence is precisely 0.

Recall that the Dirichlet characters are completely multiplicative functions. So for σ > 1, wemay apply Corollary 3.3 (the Euler Product) to write the Dirichlet L-series as

L(s, χ) =∏

p prime

(1− χ(p)

ps

)−1

.

Let us now turn our attention to the L-series for the principal character χ1:

Theorem 6.3. L(s, χ1) is an analytic function for σ > 0, except for a simple pole at s = 1 withresidue ϕ(k)/k.

Proof. Recalling that χ1(n) = 1 if gcd(n, k) = 1 and χ1(n) = 0 otherwise,

L(s, χ1) =∏

p prime

(1− χ1(p)

ps

)−1

=∏p-k

(1− χ1(p)

ps

)−1∏p|k

(1− χ1(p)

ps

)−1

=∏p-k

(1− 1

ps

)−1

=∏p

(1− 1

ps

)−1∏p|k

(1− 1

ps

)

= ζ(s)∏p|k

(1− 1

ps

).

The second factor above has finitely many terms and each term is analytic, so the whole factor isanalytic. By Theorem 5.11, ζ(s) has a simple pole at s = 1 with residue 1, so L(s, χ1) has a simple

pole at s = 1 with residue∏p|k

(1− 1

p

), which is equal to

ϕ(k)k

by Theorem 2.8.

21

We are now ready to prove the Dirichlet theorem. We follow Euler’s idea for the divergence ofthe sum of the reciprocals of primes (see proof of (3.5)). For us, this involves the logarithm of theL-series, which is complex-valued in general. We choose the branch such that for s real and σ > 1,logL(s, χ) is real. For σ > 1, using the Euler product for L(s, χ),

logL(s, χ) = log∏p

(1− χ(p)

ps

)−1

= −∑

p

log(

1− χ(p)ps

)

=∑

p

∞∑n=1

χ(p)n

npnssince

∣∣∣∣χ(p)ps

∣∣∣∣ < 1

=∑

p

χ(p)ps

+R(s) (?)

where R(s) =∑

p

∞∑n=2

χ(pn)npns

. Now

|R(s)| ≤∑

p

∞∑n=2

1npnσ

≤ 12

∑p

∞∑n=2

1pnσ

=12

∑p

1p2σ

11− 1

pσ

≤ 12· 11− 1

2σ

∑p

1p2σ

≤ 12· 11− 1

2σ

ζ(2σ) <∞ as s→ 1+

since ζ(2σ) is bounded at σ = 1.We return to considering the first sum in (?). We isolate the terms in the first sum where p ≡ a

(mod k). Since a and k are coprime, we can find b such that ab ≡ 1 (mod k). Now multiply (?) byχ(b) and sum over all characters to get∑

χ

χ(b) logL(s, χ) =∑χ

χ(b)∑

p

χ(p)ps

+∑χ

χ(b)R(s)

=∑

p

1ps

∑χ

χ(bp) +R∗(s) where R∗(s) :=∑χ

χ(b)R(s)

=∑

p : bp≡1(mod k)

1psϕ(k) +R∗(s)

= ϕ(k)∑

p : p≡a(mod k)

1ps

+R∗(s)

In the above, R∗(s) is bounded as s→ 1+ since R(s) is. Each summand where χ 6= χ1 on the leftis bounded as s → 1+, since by Theorem 6.2, L(s, χ) is analytic as s → 1+. The summand whereχ = χ1 goes to infinity as s→ 1+ by Theorem 6.3. Thus if each of L(1, χ) 6= 0 for χ 6= χ1, we wouldnot have a “∞−∞” situation on the left hand side Then the left hand side would go to infinity as

22

s→ 1+, so also would the right hand side. Since R∗(s) would remain bounded, we would obtain∑p : p≡a(mod k)

1ps→∞

as s→ 1+, i.e. the series ∑p : p≡a(mod k)

1p

diverges. This would show that there are infinitely many primes p such that p ≡ a (mod k), i.e.there are infinitely many primes in the arithmetic progression (a, a+ k, a+ 2k, . . .).

So let us show that L(1, χ) 6= 0 for χ 6= χ1: we show that this is the case separately for complexand real characters.

Theorem 6.4. L(1, χ) 6= 0 for complex characters χ 6= χ1.

Proof. Let the characters considered be modulo k. Consider

P (σ) =∏χ

L(σ, χ),

which for σ > 1 gives

logP (σ) =∑χ

logL(σ, χ)

=∑χ

∑p

− log(

1− χ(p)pσ

)using the Euler product for L(s, χ)

=∑χ

∑p

∞∑n=1

χ(p)n

npnσ

=∑

p

∞∑n=1

1npnσ

∑χ

χ(pn)

=∑

p

∑n≥1

pn≡1(mod k)

ϕ(k)npnσ

by (4.13) (vi)

≥ 0,

solim inf

σ→1P (σ) ≥ 1. (‡)

Suppose that there is a χ such that L(1, χ) = 0. Then L(1, χ) = L(1, χ) = 0, where χ is theconjugate character (χχ = χ1), not equal to χ since χ 6= χ1. Also, L(s, χ1) has a pole at s = 1 by(6.3). So the product P (1) of all the characters at s = 1 is zero, since the one single pole does notcancel the two zeroes or any other possible zeroes. This contradicts (‡). Hence L(1, χ) 6= 0.

Theorem 6.5. L(1, χ) 6= 0 for real characters χ 6= χ1.

23

Proof. Let the characters considered be modulo k. Consider the function

f(n) =∑d|n

χ(d),

which is multiplicative as it is the Dirichlet convolution of two multiplicative functions, χ and 1.Write n = pα1

1 . . . pαkk and consider the values of the function at powers of these primes. Set

ψ(i) =αi∑

m=0

χ(pmi ).

If pi | k, then

ψ(i) = 1 + χ(pi) + χ(pi)2 + · · ·+ χ(pi)αi

= 1 + 0 + 02 + · · ·+ 0αi

= 1

If pi - k and αi is odd, then

ψ(i) ≥ 1 + (−1) + (−1)2 + · · ·+ (−1)2l+1 = 0

If pi - k and αi is even, then

ψ(i) ≥ 1 + (−1) + (−1)2 + · · ·+ (−1)2l = 1

Thus f(n) ≥ 0 for all n. If n = m2 is a square, then each αi is even, so f(m2) ≥ 1. Thus

F (σ) :=∞∑

n=1

f(n)nσ

≥∞∑

m=1

1m2σ

= ζ(2σ)

so F (σ) diverges at σ = 1/2. Hence the abscissa of convergence σc ≥ 1/2. Since f(n) ≥ 0 wemay apply Landau’s theorem (Theorem 5.8) to conclude that F (s) must have a singularity ats = σc ≥ 1/2. But for σ > 1, L(s, χ) and ζ(s) converge absolutely, so by Theorem 5.12, we have

L(s, χ)ζ(s) =∞∑

m=1

χ(m)ms

∞∑n=1

1ns

=∞∑

n=1

1ns

∑d|n

χ(d)

=∞∑

n=1

f(n)ns

= F (s).

Both L(s, χ) and ζ(s) are analytic for σ > 0 except for the simple pole at s = 1 for ζ(s). If L(1, χ) =0, then F (s) would be analytic for σ > 0, and thus cannot have a singularity at s = σc ≥ 1/2. Thisis a contradiction, so L(1, χ) 6= 0.

Having established that L(1, χ) 6= 0 for χ 6= χ1, we have now proved Dirichlet’s theorem.

24

7 Final words

After Dirichlet’s theorem, more generalisations about primes in arithmetic progressions were made.We state some results obtained by later mathematicians:

Theorem 7.1 (Linnik 1944). Suppose that a and d are coprime, and that 1 ≤ a < d. Let p(a, d)be the smallest prime in the arithmetic progression an+ d. Then there exist positive constants cand L such that

p(a, d) < cdL.

Another natural question to ask is: if each such arithmetic progression contains infinitely manyprimes, what about consecutive terms in the progression? How many consecutive ones can beprime? As an example, it was found that

56211383760397 + 44546738095860k

is prime for k = 0, 1, . . . , 22. It has apparently been proved in 2004 by Green and Tao that therecan be arbitrarily consecutively many, which has widely been believed to be true for a long time.

Theorem 7.2 (Green & Tao 2004). The prime numbers contain infinitely many arithmeticprogressions of length k for all k.

A even more rigid type is consecutive primes in an arithmetic progression, that is, every numberin between the primes is composite. For example,

10099697246971424763778665558796984032950932468919

0041803603417758904341703348882159067229719 + 210k

is such a sequence, for k = 0, 1, . . . , 9.These generalisations show that even though the number of primes thin out as the numbers

grow larger, prime numbers are quite dense in the integers. The famous prime number theorem,proved in 1896, describes the approximate asymptotic distribution of prime numbers. It states that

π(x) ∼ x

log x,

where π(x) is the number of primes not exceeding x, and a(x) ∼ b(x) means limx→∞ a(x)/b(x) = 1.

25

Documents

Dirichlet’s Theorem on Arithmetic Progressions€™s Theorem on Arithmetic Progressions 1 The inﬁnitude of primes Prime numbers are the building blocks of all counting numbers: