The Green-Tao Theorem on arithmetic progressions within ...matfb/notes/dissertation.pdf · Arithmetic Progressions. Encouraged by the success of the probabilistic model in counting

The Green-Tao Theorem on arithmetic progressions within the

primes

Thomas Bloom

November 7, 2010

Contents

1 Introduction 6

1.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Arithmetic Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Structure of the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Arithmetic Progressions 12

2.1 How to count arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Szemeredi Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Pseudorandomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Uniformity Norms and the Generalised von Neumann Theorem 17

3.1 The Gowers Uniformity Norm . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 The Generalised von Neumann Theorem . . . . . . . . . . . . . . . . . . . . 19

3.3 Dual Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Decomposition Theorem 22

4.1 The Green-Tao Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 The Gowers-Hahn-Banach Proof . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 The Relative Szemeredi Theorem . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Progressions in the Primes 27

5.1 Counting the Primes and the W -trick . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Pseudorandom Majorant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3 The Green-Tao Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Further Results 35

6.1 Extensions of the Green-Tao Theorem . . . . . . . . . . . . . . . . . . . . . 35

6.2 Asymptotics for P (k,N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Explicit Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A Proof of the Decomposition Theorem 39

2

CONTENTS 3

B Estimates for ΛR 43B.1 Euler Product for independent linear forms . . . . . . . . . . . . . . . . . . . 46B.2 Euler product for simple linear forms . . . . . . . . . . . . . . . . . . . . . . 50B.3 Pseudorandomness of ν . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

C Fourier transform 55

D The GI and MN Conjectures 57

4

Standard Notation

The following notation and definitions are standard, and will be used without commentthroughout this dissertation.

A k-term arithmetic progression (or just a k-progression) is a set of the form {a + nb :0 ≤ n ≤ k − 1} for some a, b ∈ N. We exclude the degenerate case where b = 0.

We say that f = O(g) if there exists a constant C such that for all sufficiently large x,|f(x)| ≤ Cg(x).

f = o(1) if limx→∞

f(x) = 0.

f ∼ g if f = (1 + o(1))g.

[n] := {1, 2, . . . , n− 1, n}.φ(n) := #{m ∈ [n] : (m,n) = 1}.

µ(n) :=

1 if n = 1;

(−1)k if n is square-free and has k prime divisors;

0 otherwise.

Specific notation and conventions

We shall often be looking at the arithmetic mean of a function over a given set. For conve-nience, we denote this using the expectation notation,

E(f(x) : x ∈ X) = Ex∈Xf(x) :=1

|X|∑x∈X

f(x).

Also, since we shall be trying to count k-term progressions of primes, it is convenient tointroduce the following function:

P (k,N) := # of k-term arithmetic progressions of primes in [N ].

In many cases, we shall be taking an average over a hypercube; that is, points of the form(ω1, . . . , ωn) where ωi = 0 or 1 for all 1 ≤ i ≤ n. We denote the n-dimensional hypercubeby Cn := {0, 1}n. We denote the hypercube with the origin removed by C ′n := {0, 1}n −{(0, . . . , 0)}. Where we are dealing with such points, of the form ω = (ω1, . . . , ωn) andh = (h1, . . . , hn), we define the scalar product to be

ω · h := ω1h1 + · · ·+ ωnhn.

We will be working mainly over the ring of integers modulo N , denoted by ZN := Z/NZ,and since we are letting N → ∞ we may always assume that N is a prime, and so ZN is afield. We shall often be considering the space of functions f : ZN → R, which we denote byRN . We give this the inner product

〈f, g〉 := Ex∈ZN (f(x)g(x)),

CONTENTS 5

and where convenient the Lp norms,

‖f‖p := (Ex∈ZN |f(x)|p)1/p

for 1 ≤ p <∞, and‖f‖∞ := sup

x∈ZN|f(x)|.

We shall denote dependence on constants by subscripts. For example, Ok,δ implies thatthe constant implicit in the O notation is dependent on the constants k and δ. Similarly,ok implies that the rate of decay is dependent on k. Since in most cases there is onlyone variable, I shall omit these subscripts for clarity, only making clear which variables theconstants depend on when this is important or not clear from context.

In almost all of this dissertation, the only variable is N , and hence, for instance, f = o(1)

implies f tends to zero as N → ∞. The only other variable parameter is R :=√N , and

hence we may still take the o(1) errors to be decaying as N →∞.Whenever we use the variable p, we are ranging over primes. For example,

∑p≤x p denotes

the sum of all primes less than or equal to x.

Acknowledgements

I would like to thank my supervisor for his encouragement, advice and a careful reading ofthe first draft. I would also like to thank several of my fellow undergraduates for carefulproofreading and comments on clarity and structure.

Chapter 1

Introduction

Small arithmetic progressions within the primes are easy to come by. It is trivial to findprogressions with one or two terms, and 3, 5, 7 gives a 3-term arithmetic progression. A mo-ment’s thought yields the 5-term arithmetic progression 5, 11, 17, 23, 29. The problem quicklybecomes a lot more difficult – the first 6-term arithmetic progression is 7, 37, 67, 97, 127, 157and the current record holder has 26 terms:

43142746595714191 + 5283234035979900n for n = 0, . . . , 25.1

A natural problem to pose is whether we can find such progressions within the primes forany given length. It is easy to prove that there can be no infinite arithmetic progressionwithin the primes, so the best we can hope for is the following recent theorem of Ben Greenand Terence Tao:

Theorem 1.1 (Green-Tao, 2008 [10]). There are arbitrarily long arithmetic progressionswithin the primes.

In fact, they prove something much stronger, and give an increasing function of N asa lower bound for how many such progressions are in the first N integers. It follows thatthere are in fact infinitely many arithmetic progressions of primes of any finite length. Inthis dissertation we describe a proof of this theorem, motivating and making clear the keysteps and insights needed as we go along. It is a synthesis of methods and ideas from [10],[7] and [3], including some simplifications and new expository remarks. It is the first explicitdescription of the entire proof which includes the simplifications made since [10].

In this introductory section we present the background to the problem, including heuris-tics, related conjectures, and previous partial results which the Green-Tao theorem buildsupon. We conclude by giving an overview of the structure of the proof. Chapters 2 to 5focus on the different components of the proof, which are then brought together to prove (astronger form of) Theorem 1.1 at the end of Chapter 5. Chapter 6 gives some extensionsand related results which have since been obtained.

1Found by Benot Perichon using the PrimeGrid software by Geoff Reynolds and Jaroslaw Wroblewski, April 2010.

6

CHAPTER 1. INTRODUCTION 7

1.1 Heuristics

The prime number theorem states that if π(x) is the number of primes less than or equal tox, then

π(x) ∼ x

log x.

In probabilistic terms, this suggests that if we select an integer from [1, x] uniformly atrandom, then it is prime with probability roughly 1/ log x. This model fails as a methodof proving statements about the primes – if the primes were truly ‘random’ then we wouldexpect roughly about the same number of even and odd primes. The failure is because themodel only considers the density of the primes, and not their arithmetical properties. Thismodel is surprisingly useful, however, in formulating conjectures about how we expect theprimes to behave. By including some information about their arithmetic properties (whichoften only changes the original conjecture by a constant) these heuristics can be convertedinto proven theorems.

For example, consider the following problem: how many primes less than or equal to xare congruent to a modulo b? If a and b have a common divisor greater than 1, the answer istrivially either 0 or 1, so suppose a and b are coprime. Let us select integers below x uniformlyat random. The probability that it is prime should be roughly 1/ log x and the probabilitythat it is congruent to a modulo b is 1/φ(b), since there are φ(b) coprime congruence classesmodulo b. Thus the probability that it meets both these conditions, assuming they areindependent, should be 1/φ(b) log x and this leads us to conjecture that

πa,b(x) ∼ x

φ(b) log x

where πa,b(x) is the number of primes less than or equal to x which are congruent to a modulob. This conjecture turns out to be correct, and is known as the Prime Number Theorem forArithmetic Progressions.

Encouraged by the success of the probabilistic model in counting primes within a givenarithmetic progression, let us try it with our problem of counting arithmetic progressionswithin the primes. We now use the expectation notation, together with the fact that if A isan event then E(# of times A occurs) = P(A):

E(#a, b such that a, a+ b, . . . , a+ (k − 1)b ≤ N are all prime)

= P(a, a+ b, . . . , a+ (k − 1)b are all prime)

≈ P(a is prime) · · ·P(a+ (k − 1)b is prime)

≈ 1

logkN

where we have made the further assumption that the events of being prime are roughlyindependent. Furthermore, since there are (within a constant factor of) N2 arithmetic

8

progressions of length k in [1, N ], this leads us to conjecture that

P (k,N) ∼ N2

logkN.

In particular, since the right hand side is unbounded as N tends to infinity, there are infinitelymany k-term progressions within the primes. In fact, we shall prove something very similarto this asymptotic, giving a lower bound which only differs from the above heuristic by aconstant factor. That is, we prove the following theorem.

Theorem 1.2 (Green-Tao). For any k ≥ 3 there exists a constant ck > 0 such that, for allsufficiently large N ,

P (k,N) ≥ ckN2

logkN.

It is also possible to give an upper bound of this form, so P (k,N) is within a constantfactor of the heuristic answer. The correct asymptotic seems to be similar - the heuristicabove multiplied by some absolute constant, which reflects the arithmetical informationabout the primes which the probabilistic model does not include.

Conjecture 1.1. For any k ≥ 3 there exists a constant Ck > 0 depending only on k suchthat

P (k,N) ∼ CkN2

logkN.

This has been verified for 3 ≤ k ≤ 6 – see Chapter 6 for details. In fact, this is only aspecial case of a more general conjecture obtained by Hardy and Littlewood using similarheuristics. A linear form is a function in d variables of the shape

a0 + a1n1 + a2n2 + · · ·+ adnd

where ai ∈ Q.

Conjecture 1.2 (Hardy-Littlewood Prime Tuples Conjecture [14]). Let ψ1, . . . , ψk be linearforms in d variables. Then for some constant C dependent only on the linear forms,

#{n = (n1, . . . , nd) ∈ [0, N ]d : ψ1(n), . . . , ψk(n) prime} ∼ C Nd

logkN.

An affirmative answer to this conjecture would not only give asymptotic for P (k,N), butalso settle the long-standing Twin Primes Conjecture and Goldbach Conjecture. See [7] formore details. The Hardy-Littlewood conjecture is still unproven, although by extending themethods used in [10], Green and Tao have proven it for a significant class of linear forms in[7].


1.2 Arithmetic Progressions

The advances which led to the Green-Tao theorem are more about the structure of arithmeticprogressions than about the nature of the primes themselves. As Ben Green puts it in [5],they “lie not in our understanding of the primes but rather in what we can say aboutarithmetic progressions.”

The problem of finding arithmetic progressions within sets began with a 1936 paper [1]of Erdos and Turan, in which they introduce the function rk(n), defined to be the size of thelargest subset of {1, . . . , n} with no k-term arithmetic progression. The problem of locatingk-term arithmetic progressions within sufficiently large sets is equivalent to showing thatrk(n) grows sufficiently slowly. In particular, they conjectured that rk(n) = o(n) for any

k. If we define the density of a set of integers A to be lim infN→∞|A∩{1,...,N}|

N, then this is

equivalent to the statement that any set of integers with positive density contains a k-termprogression for any k. In fact, Erdos later made the stronger conjecture that

Conjecture 1.3 (Erdos). If ∑n∈A

1

n=∞

then A contains arbitrarily long arithmetic progressions.

This is essentially equivalent to showing that rk(n) = O(

nlogn

)for any k. Note that this is

stronger than rk(n) = o(n), since it gives an explicit upper bound on the rate of decay. TheGreen-Tao theorem would be an immediate corollary of this conjecture, since

∑p

1p

diverges.

The weaker conjecture that rk(n) = o(n) was proven for the first non-trivial case k = 3 byRoth in 1956, giving the following theorem.

Theorem 1.3 (Roth, 1956 [18]). Any subset of N with positive density contains a 3-termarithmetic progression.

The case k = 4 was proven by Szemeredi in 1969, and in 1975 he managed to extend thisto arbitrary k using complicated combinatorial arguments.

Theorem 1.4 (Szemeredi, 1975 [19]). Any subset of N with positive density contains arbi-trarily long arithmetic progressions.

In fact, we can strengthen Szemeredi’s theorem to show that we can find not just one,but infinitely many progressions of any finite length using a simple combinatorial argumentfirst noted by Varnavides.

Corollary 1.1 (Varnavides, 1959 [24]). Let A ⊆ N and k ≥ 2 . If there exists δ > 0 suchthat A ∩ {1, . . . , N} ≥ δ for all sufficiently large N , then there exists a constant ck,δ > 0such that, for sufficiently large N , there are at least ck,δN

2 k-term arithmetic progressionsin A ∩ {1, . . . , N}.

10

The prime number theorem implies the primes have density zero, and hence Szemeredi’stheorem cannot be applied directly. It will, however, be invoked at a crucial step in provingthe Green-Tao theorem.

Szemeredi’s theorem has been reproven several times since 1975 using methods from asurprisingly diverse array of mathematical fields – first by Furstenberg in 1977 using ergodictheory and then twice by Gowers, using Fourier analysis and hypergraph regularity. Morerecently, in 2009 an alternative combinatorial proof was found by the internet polymathproject. For a detailed survey of the different proofs see [21]. It was insights gained fromstudying the common features of these different proofs which led to the methods used byGreen and Tao.

The first progress on the problem of progressions within the primes came from van derCorput in 1939, who settled the result for k = 3 using the circle method.

Theorem 1.5 (van der Corput, 1939 [23]). There are infinitely many arithmetic progressionsconsisting of three primes.

After this theorem, although significant results were obtained for sets of positive densityas outlined above, no further results were obtained for the primes until 1981 when Heath-Brown showed, using methods similar to van der Corput’s, the following partial result fork = 4.

Theorem 1.6 (Heath-Brown, 1981 [15]). There are infinitely many arithmetic progressionsconsisting of three primes and one almost-prime (that is, a number with only two primefactors, counted with multiplicity).

1.3 Structure of the Proof

The outline of the proof given here is new, although of course all the ideas are implicit in theoriginal approach of Green and Tao. In [10], however, they did not make the transferenceprinciple explicit; it is discussed in more detail in, for example, [3].

The fundamental insight behind Green and Tao’s work was that, heuristically, a largerandom subset of the integers is very similar to the integers themselves, conclusions whichhold for the latter should hold for the former. In other words, there should be a kindof transference principle which would allow results which hold for the integers to hold forsufficiently random subsets.

Let us call such subsets pseudorandom sets. Applying the transference principle to Sze-meredi’s theorem, we may hope the following to hold.

Conjecture 1.4 (Relative Szemeredi Theorem). Let X ⊂ N be sufficiently pseudorandom.Then any subset of X with positive density inside X has arbitrarily long arithmetic progres-sions.

In particular, although the primes have zero density within N, we may hope to find somepseudorandom set X ⊂ N in which the primes have positive density, and deduce that the


primes contain arbitrarily long arithmetic progressions. In [10], Green and Tao do exactlythis, and their proof can be divided into two distinct parts.

The first is proving the Relative Szemeredi theorem – that is, showing that the kind ofstructure reflected in Szemeredi’s theorem is amenable to the transference principle men-tioned above. This is accomplished using the machinery of Gowers uniformity norms, firstintroduced by Gowers to prove Szemeredi’s theorem. In particular, these norms induce anotion of distance between subsets of N with the following properties.

The first is that this distance preserves the structure of containing arithmetic progressions,in that sets which are close have roughly the same number of arithmetic progressions. Thisis formalised as the Generalised von Neumann theorem. The mathematics needed here relieson the Gowers uniformity norms, and is similar to the kind of regularity arguments used inthe hypergraph proof of Szemeredi’s theorem by Gowers. This, along with a discussion ofthe uniformity norms, is the subject of Chapter 3.

The second is the transference principle mentioned above: a set which is dense inside somepseudorandom subset of the integers is close to a set which is dense within the integers. Thisis formalised as the Decomposition theorem. In the original proof, Green and Tao usedfinitary ergodic theory to prove this, inspired by Furstenberg’s proof of Szemeredi’s theoremusing ergodic theory. In this dissertation, we present a simpler proof discovered by Gowersusing functional analysis. This is the subject of Chapter 4.

The third component is Szemeredi’s theorem, which shows that the larger set obtainedfrom the decomposition contains many arithmetic progressions. We then invoke the Gen-eralised von Neumann theorem to show that the original set also contains many arithmeticprogressions. This finishes the proof of the Relative Szemeredi theorem.

With this in place, the second part of the proof of the Green-Tao theorem is showing thatthe hypothesis of the Relative Szemeredi theorem is met: the primes sit inside a pseudoran-dom set with positive density. For this, we will use a weighted version of the almost-primes– numbers with few prime factors. We will discuss this part of the proof in Chapter 5. Toshow that the almost-primes are sufficiently pseudorandom uses techniques from traditionalanalytic number theory, and Green and Tao were able to use arguments and results alreadyestablished by Goldston and Yıldırım in their work on small gaps between the primes [2].This part of the argument has since been simplified; we have incorporated these simplifica-tions into Chapter 5.

Chapter 2

Arithmetic Progressions

In this chapter we discuss arithmetic progressions and Szemeredi’s theorem in more detail,and formulate the precise definition of pseudorandomness which we shall require. The ex-position in this chapter is new, but the ideas it discusses are all present in [10] and earlierwork.

2.1 How to count arithmetic progressions

In many problems dealing with the existence of certain structures in the natural numbers,it is easier to try to solve the apparently more difficult problem of counting how many suchstructures we may expect to find in any finite subset of the natural numbers. Hence it issufficient to show that this count is not zero.

Another simplification that can be made is to consider functions instead of sets. We maypass from considering a subset A ⊆ N to its characteristic function 1A : N → {0, 1} usingthe following important observation:

1A(x)1A(x+ r) · · · 1A(x+ (k − 1)r) =

{1 if x, x+ r, . . . , x+ (k − 1)r ∈ A;

0 otherwise.

Hence we may count all arithmetic progressions within A ∩ ZN by the sum∑x,r∈ZN

1A(x)1A(x+ r) · · · 1A(x+ (k − 1)r).

Note that we have switched from considering arithmetic progressions within {1, . . . , N} tothose within ZN . This is to avoid the problem that, for instance, x, r ∈ {1, . . . , N} does notguarantee that x+ (k − 1)r ∈ {1, . . . , N}. We discuss this issue further below.

Statements about the existence of arithmetic progressions then reduce to the above sumbeing non-zero. In fact, whenever the sum is bounded below by a constant, it is also boundedbelow by a constant multiple of N2 for sufficiently large N , thanks to simple arguments suchas were used by Varnavides to prove Corollary 1.1. Hence we in fact consider the above sum

12

CHAPTER 2. ARITHMETIC PROGRESSIONS 13

weighted by 1N2 . This leads us to the expectation notation, and motivates us to make the

following definition:

Definition 2.1. Let f0, . . . , fk−1 : ZN → R. The normalised count of k-term arithmeticprogressions in f0, . . . , fk−1 is defined by

Υk(f0, . . . , fk−1) := E (f0(x)f1(x+ r) · · · fk−1(x+ (k − 1)r) : x, r ∈ ZN) .

The normalised count of k-term arithmetic progressions in f is

Υk(f) := Υk(f, . . . , f).

Remark 2.1. The use of Υk was not present in [10], although they repeatedly use theexpectation it is shorthand for. Conventionally this expectation is denoted by Λ, but in thiscontext this would create confusion with the von Mangoldt function for the primes.

The discussion above shows that, for A ⊆ ZN , there areN2Υk(1A) many k-term arithmeticprogressions in A.

There are two important things to observe about the way we are counting arithmeticprogressions. The first is that we include the degenerate case when r = 0. This will not bea problem, as such degenerate cases will contribute at most 1/N to Υk, which we shall onlybe estimating up to o(1) errors.

The second potential problem is the wraparound issue noted above – we are countingarithmetic progressions in ZN , rather than {1, . . . , N}. For instance, when N = 5, we wouldinclude {1, 4, 2} as a 3-term arithmetic progression. This will happen if and only if, for some1 ≤ i ≤ k − 1, the term a + ir is larger than N , for then it would have to wraparound ZN .One crude way to avoid this, which will be sufficient, is to restrict a and r to being less thanN/k. In the case of the primes, we will ensure this by incorporating some small factor in thedefinition of our counting function f to ensure such wraparound arithmetic progressions arenot counted.

2.2 Szemeredi Revisited

By considering characteristic functions instead of sets, using a Varnavides argument, andusing the wraparound trick mentioned above to pass from considering {1, . . . , N} to ZN , wemay rewrite Theorem 1.4 as follows:

Theorem 2.1. For any k ≥ 1 and δ > 0 there exists a constant ck,δ > 0 such that thefollowing holds. Let f : ZN → {0, 1} such that, for sufficiently large N , the density Ef ≥ δ.Then for sufficiently large N ,

Υk(f) ≥ ck,δ.

An important consequence of the approach of Gowers was the realisation that the functionf need not be discrete, but it is sufficient that it be bounded above by 1. This leads us tothe following formulation of Szemeredi’s theorem:

14

Theorem 2.2. For any k ≥ 1 and δ > 0 there exists a constant ck,δ > 0 such that thefollowing holds. Let f : ZN → R such that 0 ≤ f(x) ≤ 1 for all x ∈ ZN and, for sufficientlylarge N , the density Ef ≥ δ. Then for sufficiently large N ,

Υk(f) ≥ ck,δ.

As explained in the introduction, this theorem solves the problem adequately for suffi-ciently dense sets of integers, but cannot be applied to the primes, since they have zerodensity. In particular, if we let 1P be the characteristic function of the primes, then theprime number theorem implies that E1P ∼ 1

logN→ 0 as N → ∞. Thus the Ef ≥ δ > 0

hypothesis of Theorem 2.2 is not satisfied.We can avoid this and increase the density of our prime counting function by weighting

it with a logN factor – that is, we instead consider the function f := logN · 1P . This nowsatisfies the density hypothesis, but is no longer bounded above by 1. The Relative Szemereditheorem allows us to weaken this restriction, requiring only that it is bounded above by somesufficiently pseudorandom function. In particular, as an analogue to Theorem 2.2, we havethe following precise version of Conjecture 1.4.

Conjecture 2.1 (Relative Szemeredi Theorem). Let ν : ZN → R be k-pseudorandom, andlet f : ZN → R such that 0 ≤ f ≤ ν. If there exists a constant δ > 0 such that, forsufficiently large N , the density Ef ≥ δ, then there exists a constant c′k,δ > 0 such that, forsufficiently large N ,

Υk(f) ≥ c′k,δ.

To prove this theorem, we use the strategy outlined in the introduction. It will be provenin Chapter 4 as Theorem 4.4.

2.3 Pseudorandomness

This section gives some original remarks on the notion of pseudorandomness from [10], andpresents a clearer classification of the pseudorandomness condition into two components.

We first explain what kind of pseudorandomness we will need. Recall that ν is a functionfrom ZN to R, and should be thought of as a weighted indicator function for a subset of theintegers which is sufficiently pseudorandom for a transference principle to hold. The preciseconditions given below are determined by what we require in the technical theorems to comelater, but they reflect the following principles:

1. ν should behave like the constant function 1, since the pseudorandom set we transferto should behave like the integers.

2. The events ν(a) and ν(b) should be independent, for distinct a and b, since the proba-bility that distinct elements belong to the pseudorandom set is independent.

We divide the definition of pseudorandom below into two parts, which correspond to thetwo components of the Relative Szemeredi theorem: the Generalised von Neumann theorem

CHAPTER 2. ARITHMETIC PROGRESSIONS 15

and the Decomposition theorem. In [10], Green and Tao also divide up the definition intotwo parts, though they do it differently into a linear forms condition and a correlationcondition. In that form, however, it is not clear that there is a distinction between the typesof pseudorandomness which the two components require.

The first is the pseudorandomness required to prove the Generalised von Neumann theo-rem.

Definition 2.2 (Linear Pseudorandomness). We say that ν is linearly k-pseudorandom ifwhenever we have a system of m ≤ k2k−1 linear forms in t ≤ 3k − 4 variables,

ψi(x) :=t∑

j=1

Lijxj where 1 ≤ i ≤ m,

such that none of the t-tuples (Lij)1≤j≤t ∈ Qt are zero, none is a rational multiple of another,and moreover for each i, j the coefficient Lij ∈ Q has numerator and denominator boundedby k in absolute value, then

Ex∈ZtN (ν(ψ1(x)) · · · ν(ψm(x))) = 1 + ok(1).

Remark 2.2. All the linear forms in this condition are assumed to be homogenous, that is,having zero constant term so ψ(0) = 0. In particular, we have the measure condition,

E(ν) = 1 + o(1),

which agrees with our first principle. Linear pseudorandomness should be viewed as a kindof independence between ν(ψ1), . . . , ν(ψm), in accordance with our second principle. Thisis a very strong condition, since it gives us a great deal of control over a large class oflinear forms, in particular the k linear forms in 2 variables which give a k-term arithmeticprogression:

ψ1(x1, x2) := x1, ψ2(x1, x2) := x1 + x2, . . . , ψk(x1, x2) := x1 + (k − 1)x2.

From the linear pseudorandomness condition with these linear forms we get

Υk(ν) = 1 + o(1),

and a lot of arithmetic progressions counted by our pseudorandom function. The power ofthe transference principle is that we don’t lose too many of these when passing to suitablef ≤ ν.

The next condition is required for the Decomposition theorem to hold.

Definition 2.3 (Simple Pseudorandomness). We say that ν is simply k-pseudorandom ifwhenever we have m ≤ 2k−1 simple linear forms ψi in t ≤ k variables, that is, ones of theshape

ψi(x) :=t∑

j=1

ωijxj + bi

16

where ωij ∈ {0, 1}, such that the affine parts ωi = (ωi1, . . . , ωit) are not zero or rationalmultiples of each other, then

E(ν(ψ1(x)) · · · ν(ψm(x)) = 1 + o(1).

Furthermore, there exists a weight function τm : ZN → R+ such that E(τ q) = Om,q(1) for all1 ≤ q <∞ and for all h1, . . . , hm ∈ ZN we have the upper bound

Ex∈ZN (ν(x+ h1) · · · ν(x+ hm)) ≤∑

1≤i<j≤m

τ(hi − hj).

Remark 2.3. We cannot apply the first part to give an asymptotic for the second part, sincethe affine parts of the linear forms are all the same. We also observe that we cannot controlthese expressions using linear pseudorandomness, since the forms are non-homogenous.

We now make the following umbrella definition, which is required for the Relative Sze-meredi theorem.

Definition 2.4 (Pseudorandomness). We say that ν is k-pseudorandom if it is both linearlyk-pseudorandom and simply k-pseudorandom.

The constant function 1 is the easiest example of a pseudorandom function. In fact, it isalso an important one, since the space of pseudorandom functions is star-shaped around 1,as the following easily verified lemma shows.

Lemma 2.1. If ν is linearly pseudorandom, then so is λν + (1 − λ) for any λ ∈ (0, 1);similarly for simple pseudorandomness.

This lemma will be important in several places, since it will allow us to pass from boundsof the form f ≤ ν + 1 to ones of form f ≤ ν losing only a constant factor, but preservingpseudorandomness.

Remark 2.4. It is believed that these conditions are stronger than necessary. Weakeningthe strength of the pseudorandomness necessary (particular the simple pseudorandomnessrequired for the Decomposition theorem) is one goal of current research in this area.

Chapter 3

Uniformity Norms and theGeneralised von Neumann Theorem

In this chapter we introduce the Gowers uniformity norm, which will play a central rolein the proof. We also prove the first component of the Relative Szemeredi theorem, theGeneralised von Neumann theorem. Once again, the substantial ideas in this chapter are allpresent in [10], though the exposition in the first section contains some new ideas.

3.1 The Gowers Uniformity Norm

Recall that our strategy for proving the Relative Szemeredi theorem is to show that a setdense in a pseudorandom set is ‘close’ in some sense to one dense in the natural numbers.In terms of the functional approach in the previous chapter, we seek some metric d on RN

such that:

1. If d(f, g) is small then f and g count a similar number of k-term arithmetic progressions,and

2. If f ≤ ν for some pseudorandom ν then there exists a bounded g such that d(f, g) issmall.

The easiest way to obtain a metric is to induce it from some norm on the space. We nowgive the definition of the required norm as in [10]. First, however, we give some originalremarks to help motivate the definition.

We seek to decompose a function f ≤ ν, where ν is a pseudorandom function, as f = g+hwhere g is bounded and h is ‘small’, in the sense that Υk(g + h) is well approximated byΥk(g). We now need to specify what is meant by ‘small’.

Our initial approach might be to use Υk, the normalised count of arithmetic progressions,directly – that is, we hope to achieve a decomposition where Υk(h) is small. There are twoproblems with this approach.

17

18

The first is that we hope to approximate Υk(g + h) by Υk(g), and so we need that

Υk(g + h) = Υk(g) +∑∅6=I⊆[k]

Υk(f1, . . . , fk)

= Υk(g) + negligible terms.

where fi = h if i ∈ I and fi = g otherwise. In other words, we need not only Υk(h) small, butalso Υk(f1, . . . , fk) small whenever some fi = h. It would be sufficient to prove an inequalitysuch as

Υk(f1, . . . , fk) = Ok( inf1≤i≤k

Υk(fi)).

This cannot hold, however, as shown by the following counterexample. Define f1(n) = 1 ifn = 0, and f1(n) = 0 otherwise, and let fi ≡ 1 for all i ≥ 2. Then inf Υk(fi) = Υk(f1) =1/N2, whereas Υk(f1, . . . , fk) = 1/N for all N .

The second problem with using Υk directly is that it is not a norm on RN , for the trivialreason that it can be negative. We may avoid this by taking the absolute value, but althoughit is easily verified that |Υk| is a seminorm, it is not a norm. For example, take N = 3, andf : Z3 → R defined by f(0) = 0, f(1) = 1 and f(2) = −1. A simple calculation showsΥk(f) = 0, although f 6= 0. This is a problem for the existence of a decomposition, sincethe analytic machinery we hope to use to find such a decomposition relies on h being smallin some norm on RN .

Hence we should not demand that h be small in terms of Υk, but rather in some othernorm on RN . To discover what this should look like, focus on the first of the problemsabove – bounding Υk(f1, . . . , fk). The most common tool in bounding expectations is theCauchy-Schwarz inequality,

E(XY )2 ≤ E(X2)E(Y 2).

We need to bound Υk in terms of something involving only h, and so we must remove k− 1functions. For concreteness, let us temporarily fix k = 3. After applying the Cauchy-Schwarzinequality twice, we may bound the Υ3(f1, f2, f3) term,

E(f1(x)f1(x+ r)f2(x+ r + r) : x, r ∈ ZN),

with a product where each term has the shape

E(f(x)f(x+ h1)f(x+ h2)f(x+ h1 + h2) : x, h1, h2 ∈ ZN).

This is similar to the shape of Υ3, but with the sum r+r replaced with h1+h2 for independentvariables h1, h2. This suggests that if h is small with respect to this expectation, then wecan use the Cauchy-Schwarz inequality to show that Υ3(f1, f2, f3) is small whenever somefi = h, and hence that Υ3(g + h) ≈ Υ3(g).

Motivated by this, we make the following definition.

Definition 3.1. For any d ≥ 1, the Gowers d-uniformity norm of a function f : ZN → R isdefined as

‖f‖Ud := E

(∏ω∈Cd

f(x+ ω · h) : x ∈ ZN , h ∈ ZdN

)1/2d

.

CHAPTER 3. UNIFORMITY NORMS AND THE GENERALISED VON NEUMANN THEOREM 19

Note that, in the k = 3 case, ‖f‖4U2 is exactly the expectation we obtained above. In

general, we will use the Uk−1 norm to deal with progressions of length k.It is easy to verify that ‖ · ‖Ud is a seminorm for d ≥ 1. That it is also a genuine norm

when d ≥ 2 follows from the easily verified fact that

‖f‖U2 = ‖f‖4

where f is the Fourier transform of f , and the less obvious monotonicity property

‖f‖Ud−1 ≤ ‖f‖Ud .

Recalling that we are seeking an inequality of the shape

# of k-progressions counted by f ≤ ‖f‖Uk−1 ,

the monotonicity property agrees with the trivial observation that any (k + 1)-progressiontruncated gives a k-progression (and hence if the count of (k+ 1)-progressions is small, thenso is the count of k-progressions).

With this definition in place, we can restate our strategy for proving the Relative Sze-meredi theorem. We need to show that the Uk−1 norm has the following properties.

1. If ‖h‖Uk−1 is small then (for suitable g) Υk(g) ≈ Υk(g + h).

2. If f ≤ ν then f = g + h where g is bounded and ‖h‖Uk−1 is small.

The first is the Generalised von Neumann Theorem, which occupies the next section. Thesecond is the crucial Decomposition Theorem, which we discuss in the next chapter.

We need to control the count of arithmetic progressions over functions with small Gowersuniformity norm. Since we shall often be referring to functions with small Gowers uniformitynorms, it is convenient to make the following definition.

Definition 3.2. We say that f is η-uniform if ‖f‖Ud ≤ η, and more generally say that f isuniform if ‖f‖Ud is small.

For a more in-depth discussion of the Gowers uniformity norms, including a proof of themonotonicity property mentioned above, see (for example) Appendix B of [7].

3.2 The Generalised von Neumann Theorem

We come now to the first component needed to prove the Relative Szemeredi theorem. Aspecialised form of this theorem, when ν ≡ 1, was first used by Gowers in his proof ofSzemeredi’s theorem. The fact that it could be generalised to linearly pseudorandom ν wasfirst noticed by Green and Tao in [10] – indeed, the linearly pseudorandom condition whichwe require ν to satisfy was chosen with the proof of this theorem in mind.

The proof is long and technical, and can be found in [10]. The idea is to repeatedly applythe Cauchy-Schwarz inequality as outlined above until we are at a stage where we can applythe pseudorandom condition.

20

Theorem 3.1 (Generalised von Neumann Theorem). Let ν be linearly pseudorandom, andf0, . . . , fk−1 obey the bounds |fi(x)| ≤ ν(x) for all x. Then

Υk(f0, . . . , fk−1) = Ok( inf0≤i≤k−1

‖fi‖Uk−1) + ok(1)

Remark 3.1. Using Theorem 2.1, and rescaling the fi where necessary, we may in factweaken the conditions to |fi| ≤ ν + 2. This fact will be needed when we apply this to provethe Relative Szemeredi theorem, since we will need to apply it to h = f −g where 0 ≤ f ≤ νand 0 ≤ g ≤ 2.

Proof. Omitted. See [10], Section 3.

In particular, if h is uniform, then Υk(g+h) is approximately Υk(g). This is the key stepin the proof of the Relative Szemeredi theorem, so we formulate it precisely as follows.

Corollary 3.1. If f = g + h where |g|, |h| ≤ ν for some linearly pseudorandom ν, and h isη-uniform, then

Υk(f) = Υk(g) +Ok(η) + o(1).

Proof. Expanding out the expectation notation, we see that

Υk(g + h) = Υk(g) +∑∅6=I⊂[k]

Υ(f1, . . . , fk)

where fi = h if i ∈ I and fi = g otherwise. We then apply Theorem 3.1 and the conditionthat h is η-uniform to show that each of the terms in the sum is bounded by Ok(η)+ok(1).

3.3 Dual Norms

This section closely follows the first part of section 6 in [10], although due to the usefulnessof dual norms in the new approach to the Decomposition theorem in the next chapter, wedefine the dual norm in generality.

In general, whenever we have a norm ‖ · ‖ on RN we may define the dual norm as follows:

‖f‖∗ := sup{|〈f, g〉| : ‖g‖ ≤ 1}.

It is easy to check that this defines a seminorm on RN , and for the norms we shall be dealingwith it will also be a norm. The use of this definition lies in the inequality

〈f, g〉 ≤ ‖f‖‖g‖∗.

In particular, whenever ‖g‖∗ is small, and g correlates with f to a large degree, then ‖f‖must be large. That is, smallness of the dual norm prevents the norm of related functionsfrom being small. In the case of the Ud norms, we say that g is anti-uniform if it has smalldual Ud norm, and so anti-uniformity is an obstruction to uniformity.

Closely linked to the introduction of dual norms, we also introduce the concept of dualfunctions – at least, with respect to the Ud norms. In the following definition, and throughout

CHAPTER 3. UNIFORMITY NORMS AND THE GENERALISED VON NEUMANN THEOREM 21

the rest of this dissertation, we shall fix d = k − 1, recalling that k is to be taken as a fixedquantity. The dual function of f is defined as

Df(x) := E

∏ω∈Ck−1−0

f(x+ ω · h) : h ∈ Zk−1N

.

The use of this lies in the following lemma. This will be useful later, when we shall apply itto deduce that sufficiently uniform functions do not correlate much with their dual functions.

Lemma 3.1.〈f,Df〉 = ‖f‖2k−1

Uk−1 .

Proof. Expand out both sides using their definitions.

Chapter 4

Decomposition Theorem

The goal of this chapter is to prove the following.

Theorem 4.1 (Decomposition Theorem). Let ν be simply pseudorandom, and η some pa-rameter such that 1 > η > 0.

Suppose N is sufficiently large, depending on η. Then for every function 0 ≤ f ≤ ν wecan decompose it as f = g + h where 0 ≤ g ≤ 2 and h is η-uniform.

This is the final, and most crucial part of the proof of the Relative Szemeredi theorem,and hence of the entire Green-Tao theorem. It is presented as a decomposition, which allowsus to decompose f into a bounded part (to which we may apply Szemeredi’s theorem) anda uniform part, whose contribution is negligible by the Generalised von Neumann theorem.

It is, however, better viewed as a transference theorem: it allows us to transfer propertiesof the integers to pseudorandom subsets of the integers. In this case, the desired property isthat dense subsets contain arbitrarily long arithmetic progressions. The relationship betweendecomposition theorems and transference theorems holds in quite general terms, and isdiscussed in detail in [3].

4.1 The Green-Tao Proof

The original proof used by Green and Tao in [10] is quite different to the one present below,and relies on a finitary ergodic theory inspired by Furstenberg’s proof of Szemeredi’s theorem.We briefly sketch their approach here before presenting the simpler proof by Gowers.

Their proof constructs the decomposition in stages. They begin by looking at the decom-position f = E(f) + (f − E(f)). It follows from the pseudorandomness of ν that E(f) isbounded, so the remaining problem is to show that f − E(f) is sufficiently uniform.

Of course, there is no guarantee that it will be. Instead, they use the machinery ofconditional expectations over σ-algebras to increase the uniformity as follows. By using dualfunctions as obstructions to uniformity, if f − E(f) is not sufficiently uniform sets can beadded to create an expanded σ-algebra B. These new sets are chosen so that the conditionalexpectation E(f | B) absorbs the impact of the dual functions which were obstructing the

22

CHAPTER 4. DECOMPOSITION THEOREM 23

uniformity. In particular, the difference f − E(f | B) lacks these obstructions, and is moreuniform. Furthermore, it follows from the pseudorandomness of ν and the fact that f ≤ νthat E(f | B) remains bounded.

They continue in this fashion, keeping E(f | B) bounded at each stage, while increasingthe uniformity of f − E(f | B). There is no guarantee, however, that this process willterminate – that is, while the approximations are becoming more uniform at each stage,they may never become sufficiently uniform.

Green and Tao show that this process must terminate using an energy increment argumentused in several approaches to Szemeredi’s theorem. This argument uses the fact that ateach stage in their construction, the pseudorandomness of ν ensures that E(f | B) remainsbounded. The energy, that is, the L2-norm, of E(f | B) increases at each stage, but since itis bounded above, there must be a stage at which the energy may not increase, and henceno further approximations can be made and the process must terminate.

If the process has terminated, however, it must mean that the approximation at that stagewas sufficiently uniform, and so the decomposition at this stage meets our requirements.

4.2 The Gowers-Hahn-Banach Proof

The simpler proof outlined in this section takes a very different approach. Rather thanconstructing a decomposition explicitly, it uses the Hahn-Banach theorem to derive a con-tradiction if no decomposition exists. This approach was independently discovered by Gowers[3] and Reingold, Trevisan, Tulsiani and Vadhan [17].

The proof we give here follows the outline given in [3]. Some parts of the argument havebeen simplified, since we do not require the generality given by Gowers, and the presentationof the argument given here is new.

We begin by stating the version of the Hahn-Banach theorem1 that we will use.

Theorem 4.2 (Hahn-Banach theorem). Let K1 and K2 be closed convex subsets of RN ,each containing 0, and suppose that f ∈ RN cannot be written as a convex combinationc1f1 + c2f2 with fi ∈ Ki. Then there exists φ ∈ RN such that 〈f, φ〉 > 1 and 〈g, φ〉 ≤ 1 forevery g ∈ K1 ∪K2.

With this theorem available to us, the strategy should be fairly obvious. Recall that weneed a decomposition f = g + h where g is bounded and h is uniform. We suppose thatno such decomposition exists and use Theorem 4.2 to derive a contradiction. Roughly, thiswill be as follows: 〈f, φ〉 will be large, but 〈ν, φ〉 will be small, contradicting the fact thatf ≤ ν. We hope to say that 〈ν, φ〉 is small since it is the sum of 〈1, φ〉, which is bounded,and 〈ν−1, φ〉, which is o(1) since ν−1 is uniform and φ is anti-uniform. There are, however,significant technical difficulties to be overcome before we can put this into action. First, weneed the following simple consequence of pseudorandomness.

Lemma 4.1 (Uniformity of ν − 1). If ν : ZN → R is simply k-pseudorandom, then

‖ν − 1‖Uk−1 = o(1).1This is quite different from the Hahn-Banach theorem as it is usually stated; for a derivation of the version stated, see [3].

24

Proof sketch. Expand out the definitions and use the binomial theorem.

Now let us try to prove Theorem 4.1 using only this Lemma and the Hahn-Banach the-orem. In terms of the latter, we have two closed convex subsets of RN : positive functionsbounded by 2 and functions which are η-uniform. If the decomposition does not hold, thenby Theorem 4.2 we can find some function φ such that

1. 〈f, φ〉 > 1,

2. 〈g, φ〉 ≤ 1 for every g such that 0 ≤ g ≤ 2, and

3. 〈h, φ〉 ≤ 1 for every h such that ‖h‖Uk−1 ≤ η.

In particular, by setting g to be the function g(x) = 2 whenever φ(x) ≥ 0 and g(x) = 0otherwise, we can suppose that 〈1, φ+〉 ≤ 1

2, where φ+ is the positive part of φ defined by

φ+(x) := φ(x) when φ(x) ≥ 0 and 0 otherwise. We have the following chain of inequalities:

1 < 〈f, φ〉 ≤ 〈f, φ+〉 ≤ 〈ν, φ+〉 = 〈1, φ+〉+ 〈ν − 1, φ+〉 ≤1

2+ ‖φ+‖∗Uk−1‖ν − 1‖Uk−1 .

Using Lemma 4.1, to obtain a contradiction for N sufficiently large it suffices to show that φ+

is anti-uniform. The problem is that condition 3 only gives us a bound for ‖φ‖∗Uk−1 , and this is

not strong enough. The difficulty lies in passing from φ from φ+, which is necessary since wecan only deduce from f ≤ ν that 〈f, φ〉 ≤ 〈ν, φ〉 if φ is strictly non-negative; if some strongerversion of Theorem 4.2 were available that guaranteed φ ≥ 0 then the simple argument givenabove would be sufficient. In particular, instead of simple pseudorandomness, all we wouldneed is the weaker condition ‖ν − 1‖Uk−1 = o(1).

To fix this argument, we will show that φ+ can be approximated by a function that isanti-uniform. This is technically messy, and we leave the details to Appendix A. It is inproving this approximation, however, that the majority of the simple pseudorandomnesscondition is required. It gives the following theorem.

Theorem 4.3 (Approximation with an anti-uniform function). Condition (3) above impliesthat there exists a function ψ such that ‖ψ − φ+‖∞ ≤ 1/8 and ‖ψ‖∗

Uk−1 ≤ A for some Adepending only on η.

Using this theorem, we may adapt the chain of inequalities above to use this approximationto φ+, and obtain the inequality

1 < 〈f, φ〉 ≤ 3

4+ o(1) + A‖ν − 1‖Uk−1 .

Since A is fixed and ‖ν − 1‖Uk−1 is o(1), we have a contradiction for N large enough, whichproves Theorem 4.1. Once again, the details are technical and left to Appendix A.

CHAPTER 4. DECOMPOSITION THEOREM 25

4.3 The Relative Szemeredi Theorem

We now have all we need to prove the main component of the Green-Tao theorem. The proofbelow fills in the sketch given in [10], making some minor changes since our Decompositiontheorem is different to the form in which it is given there.

Theorem 4.4 (Relative Szemeredi Theorem). Let k ≥ 3 and δ > 0, and let ν : ZN → R+ bek-pseudorandom. Suppose f : ZN → R satisfies 0 ≤ f(x) ≤ ν(x) for all x ∈ ZN and Ef ≥ δ.Then for all sufficiently large N ,

Υk(f) ≥ck,δ/3

2

where ck,δ/3 > 0 is the constant appearing in Theorem 2.2.

Proof. Let 0 < η < 1 be some parameter to be chosen later, and let f = g + h be thedecomposition given by Theorem 4.1. Hence we have 0 ≤ g ≤ 2 and h is η-uniform. Wewould like to apply Szemeredi’s theorem to the function g; however, it is bounded above by2 rather than 1, and its density is bounded below by a function of η, which we need to beindependent of our constant to be able to later take it sufficiently small. Hence we insteadconsider the function (g + η)/(2 + η). We now have

E(g + η

2 + η

)=

E(f)− E(h) + η

2 + η≥ δ

2 + η>δ

3,

since |E(h)| ≤ E(|h|) = ‖h‖U1 ≤ ‖h‖Uk−1 ≤ η. Furthermore, we have

0 ≤ g + η

2 + η≤ 1.

Hence for the function (g+ η)/(2 + η), the conditions of Szemeredi’s theorem, Theorem 2.2,are met and we may apply it to obtain the lower bound, for N sufficiently large (dependingonly on k and δ)

Υk(g + η) ≥ Υk

(g + η

2 + η

)≥ ck,δ/3

for some constant c dependent only on k and δ. Since η < 2, putting our upper bounds intothe definition of Υk gives us

Υk(f0, . . . , fk−1) ≤ 2k−1η = Ok(η)

whenever fj = η or g for 0 ≤ j ≤ k − 1, and at least one fi is equal to η. Hence

Υk(g) ≥ ck,δ −Ok(η).

On the other hand, ‖h‖Uk−1 ≤ η, and |g|, |h| ≤ f+g ≤ ν+2. Hence by applying Corollary 3.1(and Remark 3.1) we see that

Υk(f) = Υk(g) +Ok(η) + ok(1).

26

In particular,Υk(f) ≥ ck,δ/3 −Ok(η)− ok(1).

By taking η small enough (depending on k and δ) we can ensure that the Ok(η) term is lessthan c/4, and by taking N sufficiently large, we can also ensure that the ok(1) term is lessthan c/4. Hence, for N sufficiently large,

Υk(f) ≥ck,δ/3

2

as required.

Weak Pseudorandomness

This section outlines a slightly different approach, the existence of which is hinted at by aremark in [10].

If we can assume that δ is dependent on k, then we can prove a Relative Szemereditheorem from conditions which are strictly weaker than the pseudorandomness conditionswe have been using so far. This will be important in our application to the primes, when weshall only be able to prove these weaker conditions.

Let εk be some sufficiently small constant depending only on k. Then we define weaklinear pseudorandomness and weak simple pseudorandomness by changing the asymptoticsand upper bounds required by adding a O(εk) factor. This affects our argument as follows.

The Generalised von Neumann theorem is altered by a factor of Ok(εk), using an almostidentical proof.

The Decomposition theorem requires the condition that η is sufficiently small dependingon εk. This is because a factor of ‖ν − 1‖Uk−1 is no longer o(1), but is instead O(εk) + o(1).

By using these modified theorems in the proof of our Relative Szemeredi theorem above,we arrive at a lower bound of the form

Υk(f) ≥ ck,δ −Ok(η)−Ok(εk) + o(1).

If δ is dependent on k, then as long as we take εk sufficiently small, we may still arrive at therequired lower bound. This gives us the following alternative Relative Szemeredi theorem,which we shall use in our application to the primes.

Theorem 4.5 (Alternative Relative Szemeredi Theorem). Let k ≥ 1 and let ν be a weakk-pseudorandom function. Suppose f : ZN → R satisfies 0 ≤ f(x) ≤ ν(x) for all x ∈ ZN

and Ef ≥ 110k

(say). Then for all sufficiently large N ,

Υk(f) ≥ ck2

where ck = ck,1/30k > 0 is the constant appearing in Theorem 2.2.

Chapter 5

Progressions in the Primes

In this chapter we apply the Relative Szemeredi theorem to the problem of locating arithmeticprogressions within the prime numbers. The exposition here is original, and presents acombined and simplified version of ideas used in [10] and [7].

To apply Theorem 4.5 to the primes, we require two things:

1. A suitable function f : ZN → R+ which counts only primes and with E(f) ≥ δk forsome δk > 0 dependent only on k, and

2. A (weakly) k-pseudorandom function ν such that f ≤ ν.

5.1 Counting the Primes and the W -trick

The first obvious candidate for a function which counts only primes is the prime indicatorfunction, 1P (n), which is defined to be 1 if n is prime and 0 otherwise. This suffers fromthe same difficulty which prevented us from applying the regular Szemeredi theorem to theprimes: the primes do not have positive density, since it follows from the prime numbertheorem that E(1P ) = O( 1

logN) = o(1).

There is, however, a standard method of avoiding this issue – instead of counting theprimes with weight 1, we count them with weight log p, using the von Mangoldt function1

Λ(n) :=

{log n if n is prime, and

0 otherwise.

It is a simple consequence of the prime number theorem that E(Λ) = 1 + o(1), and hence iscertainly positive for sufficiently large N . Crucially, however, it is not bounded above – thisis the reason that we require the Relative Szemeredi theorem.

We may hope to use Λ to count the primes, but there are two complications, the firsteasy to deal with and the second more troublesome.

1Normally the von Mangoldt function is also taken to count prime powers, and the contribution from these is shown to benegligible. For this application, however, including the prime powers would not make the argument any easier, so we have justexcluded them from the definition.

27

28

The first is the wraparound problem noted in Chapter 2 – since we are working in ZN ,we need to ensure that our arithmetic progressions are still progressions within [1, N ]. Forthis we need a+ (k− 1)b ≤ N which can be forced by only counting arithmetic progressionsin ZN with a, b ≤ N/k. Again, for this to hold it is sufficient if the progression is containedentirely within [εkN, 2εkN ] for any εk < 1/k. To ensure this, we shall modify Λ to be zeroon n outside this interval.

The second problem is that the primes do not behave as randomly as we need them to.Recall that we need to show that the primes are inside some pseudorandom set with positivedensity. Pseudorandom sets behave similarly to random sets – in particular, they shouldhave uniform distribution across all congruence classes as N tends to infinity. Since theprimes have positive density within this set, they should also reflect this, and be roughlyuniformly distributed across all congruence classes.

The primes are obviously not distributed this uniformly, however – for instance, there isonly one prime congruent to 0 modulo 2. It appears then that our primes in fact do not sitinside a pseudorandom set with the required positive density. To avoid this difficulty, Greenand Tao introduce the W -trick and restrict their attention to primes in certain restrictedcongruence classes.

The key observation is that how pseudorandom we need our set to be depends onlyon k, which is fixed. Hence the majorant need not be fully random, but only randomenough dependent on k. In particular, it does not need to be uniformly distributed acrossall congruence classes, but only those small enough to be detected by the linear forms in thedefinition of k-pseudorandom. Hence the primes which are inside this majorant should beroughly uniformly distributed in these small congruence classes.

If we only look at integers belonging to one of these small congruence classes, both in themajorant and in the primes, this avoids the difficulty. The primes do not need to be uni-formly distributed across all small congruence classes, since we know that our pseudorandommajorant only includes one of them.

To be precise, let w be some sufficiently large integer, depending only on k, and thendefine

W :=∏p≤w

p

to be the product of all primes below w. We restrict ourselves to looking at the congruenceclass n ≡ 1 modulo W .2 Hence we look at Λ not over the interval {1, . . . , N}, but rather onthe set {W + 1, 2W + 1, . . . , NW + 1}, and set it to be zero everywhere else.

Restricting ourselves in this way reduces the primes we are counting by a factor polynomialin W , and hence our eventual lower bound for the count of progressions in the primes willbe off by some polynomial factor in W . Since W is dependent only on k, however, we mayincorporate this factor into our constant ck.

In technical terms, we use the W -trick in verifying pseudorandomness to be able to deducethat for p > w the linear forms we are estimating over are independent over Zp as well as

2We may in fact choose any b coprime to W , and in fact choosing this using the pigeonhole principle avoids the use for theprime number theorem. We follow Green and Tao in choosing 1 for simplicity here.

CHAPTER 5. PROGRESSIONS IN THE PRIMES 29

over Q. This enables us to bound certain local factor estimates with a crucial p−2, and wecan obtain the estimates we require by taking w sufficiently large. For more details, seeAppendix B.

Including both the wraparound factor and the W -trick in the definition of Λ gives therequired prime counting function.

Definition 5.1 (Prime Counting Function).

Λ(n) :=

{φ(W )2λkW

log(Wn+ 1) if n ∈ [εkN, 2εkN ] and Wn+ 1 is prime, and

0 otherwise.

for some εk and λk sufficiently small depending only on k.

The factor φ(W )/W here is required largely to provide a lower bound for the densitywhich is independent of W . This is to avoid circularity, since how large we need to take wpartially depends on the density of Λ. The factor of 1/2λk is to ensure that it is majorizedby the pseudorandom function we shall construct in the next section.

To show that Λ is suitable for use in the Relative Szemeredi theorem, it remains to showthat its density is bounded below. We require the following classical theorem of analyticnumber theory.

Theorem 5.1 (Prime Number Theorem for Arithmetic Progressions).

∑p≤x

p≡a (mod b)

log p =x

φ(b)(1 + o(1)).

We use this theorem to show that our prime counting function Λ satisfies the densityrequirement:

Lemma 5.1. For sufficiently large N ,

E(Λ(x) : x ∈ ZN) ≥ εk4λk

.

Remark 5.1. The constants εk and λk are those in the definition of the prime countingfunction, above, and the pseudorandom majorant, below. Exact values could be computed,but the important thing is that they are dependent only on k, and hence so is the lowerbound for the density given in this lemma.

30

Proof. We simply expand out the expectation notation and apply Theorem 5.1 as follows:

E(Λ(x) : x ∈ ZN) =φ(W )

2λkWN

∑εkN≤n≤2εkNWn+1 is prime

log(Wn+ 1)

=φ(W )

2λkWN

∑εkWN≤p≤2εkWNp≡1 (mod W )

log p+ o(1)

=φ(W )

2λkWN

(2εkWN

φ(W )− εkWN

φ(W )

)(1 + o(1))

=εk

2λk(1 + o(1)).

In particular, for sufficiently large N , we can assume that 1 + o(1) ≥ 12, which gives us the

result.

5.2 Pseudorandom Majorant

Now let us consider the construction of the pseudorandom majorant for Λ. Again, a naturalcandidate is just Λ itself. The problem is that verifying pseudorandomness for Λ is extremelydifficult – the linear forms condition, for instance, is comparable in difficulty to the primetuples Conjecture 1.2, and hence harder than both the Twin Primes Conjecture and theGoldbach Conjecture.

Instead, we use an idea from sieve theory. Traditional sieve theory methods have provenextremely successful in obtaining estimates and asymptotics for the almost-primes (numberswith few prime divisors), but its methods cannot be refined to include the primes themselves.Fortunately, however, all we require now is a majorant for the weighted primes, and theweighted almost-primes will do the job.

Recall that we counted the weighted primes using a restricted form of the Λ function.Hence, we first consider the elementary identity

Λ(n) =∑d|n

µ(d) log(nd

).

We need to adjust this to also count numbers with few prime factors. Note that if n hasmany prime factors, then most of its divisors will be small in relation to n. In particular,by truncating the sum in the identity above, only summing over divisors less than someparameter R, we obtain a function that is approximately Λ for numbers with many primefactors. Hence this modified function only differs from Λ on the almost-primes.

Motivated by this, we introduce the truncated divisor sum

ΛR(n) :=∑d|nd≤R

µ(d) log

(R

d

)=∑d|n

µ(d) log+

(R

d

).


If p is a prime sufficiently large with respect to R then the only term counted in the sumabove will be d = 1, and so ΛR(p) = Λ(p). Of course, ΛR will count more than just the primesbut, as noted above, it will effectively only count almost-primes. Hence ΛR can be viewed asa weighted indicator function for the almost-primes, and so just as we used Λ modified by theW -trick to count the primes, we shall use ΛR modified by the W -trick as a pseudorandommajorant. One small obstacle to overcome is that we require our pseudorandom majorant tobe positive, whereas ΛR can take on negative values. We circumnavigate this by the simpletrick of squaring the function to guarantee positive values.

By including the almost primes, obtaining estimates for ΛR is significantly easier than forΛ, and ideas from sieve theory can be applied. Green and Tao were fortunate here, in thatGoldston and Yıldırım had already considered the truncated divisor sum in their work onsmall gaps between primes [2], and had effectively proven the linear forms estimate required.It was this proof which was used in the original paper [10].

A significant simplification for this part of the proof was discovered later by Green andTao, and is outlined in [7] and [20]. In the original proof, to be able to provide the requiredestimates for sums of Λ2

R, the log+(R/d) factor in the definition of ΛR was replaced by acontour integral using the identity

log+ x =1

2πi

∫Γ

xz

z2dz

for a vertical line Γ. This enabled them to replace the sum with a contour integral involvingthe Riemann zeta function ζ, to which classical information about the zeta function couldbe applied. In particular, they required the existence of a certain region to the left of theline <(z) = 1 in which ζ is free of zeroes.

The new idea was as follows. They first replace the log+(R/d) factor with a smoothapproximation,

χ

(log d

logR

)where χ is some smooth, bounded function with compact support. Note that log+(R/d)corresponds to taking χ(x) = logR(1 − x) for x ∈ (0, 1) and χ(x) = 0 otherwise. Then,instead of replacing that with a contour integral, they could replace it with its Fouriertransform. The crucial fact here was that they could truncate the integral and consider itover a bounded interval at the cost of o(1) errors since (as χ is smooth) the Fourier transformdecays very rapidly.

In the remainder of the argument then, instead of having to estimate functions involvingζ(z) with |z| unbounded, they had only to consider z such that z = 1 + o(1). In this case,the only fact about ζ required is the existence of a simple pole at z = 1. These ideas areexplained in detail in Appendix B.

To be able to use this simpler argument, we define a modified form of the truncateddivisor sum as follows.

32

Definition 5.2 (Modified Truncated Divisor Sum).

ΛR(n) :=∑d|n

µ(d)χ

(log d

logR

)where χ is some smooth, bounded function with compact support such that χ(0) = 1,

χ(x) = 0 for |x| ≥ 1 and∫ 1

0|χ′(x)|2dx = 1.

These conditions on χ are required for technical reasons, but χ(log d/ logR) should beviewed as a smooth approximation to log+(R/d), and so ΛR is a smooth approximation toΛR.

We require one final adjustment before the definition of the pseudorandom majorant. Inestimating the linear forms and correlation expectations required to demonstrate pseudo-randomness, we get the hoped-for 1 + o(1) term, multiplied by a factor of W

φ(W ) logR. To

compensate for this, we scale our pseudorandom majorant to remove this factor. Hence wearrive at our final definition as follows.

Definition 5.3 (Pseudorandom Majorant for the Primes). We define ν : ZN → R+ by

ν(n) :=

{φ(W ) logR

WΛR(Wn+ 1)2 if n ∈ [εkN, 2εkN ], and

1 otherwise

for some sufficiently small εk depending only on k, and where R := Nλk for some sufficientlysmall λk depending only on k.

Remark 5.2. This differs from the ν used in [10] by using the smoothed out ΛR for reasonsoutlined above.

As outlined in the previous section, the W -trick is required so ν only considers numberscongruent to 1 modulo W . The two-part definition of ν is needed due to difficulties inpassing between [1, N ] and ZN . Namely, we prove correlation estimates for ΛR over theinterval [1, N ], and to be able to apply these to the function ν (which is instead definedover ZN), we must truncate to some small interval. The λk determines how small the cut-offparameter R is, and hence how ‘almost’ the almost-primes we count are.

We first verify that it is a majorant for our prime counting function Λ. The factor ofφ(W )2λkW

in the definition of Λ is required partially to make this proof work.

Lemma 5.2. For N sufficiently large depending on k, ν(n) ≥ Λ(n) for all n ∈ ZN .

Proof. Since we have squared the ΛR in the definition of ν, it is clear that ν(n) ≥ 0 for alln ∈ ZN . The claim follows immediately unless Wn + 1 is a prime and n ∈ [εkN, 2εkN ] (forotherwise Λ(n) = 0), so let us suppose we are in this case.

By taking N sufficiently large we may also suppose that Wn + 1 ≥ WN2k

+ 1 > Nλk = Rsince W is dependent only on k.


Therefore ΛR(Wn+ 1) = 1, and so

ν(n) =φ(W )

WlogR =

φ(W )

λkWlogN ≥ φ(W )

λkWlog

(n

2εk

).

For n sufficiently large (which we may force by taking N sufficiently large) n2εk≥√Wn+ 1

and so

ν(n) ≥ φ(W )

λkWlog(√Wn+ 1) = Λ(n).

All that remains now is to verify that ν is weakly k-pseudorandom, for which we mustprovide an asymptotic and an upper bound for ν taken over linear forms. The proofs ofthese are long and technical, so have been postponed to Appendix B. The arguments thereare a simpler case of those given in [7], though we have made some simplifications since wedo not require the generality proven there.

Theorem 5.2. ν is weakly k-pseudorandom.

Proof. See Appendix B.

Remark 5.3. We only obtain the weak pseudorandomness condition discussed at the endof Chapter 4, which is sufficient for these purposes since Lemma 5.1 gives a lower boundfor the density depending only k. To obtain the strong linearly pseudorandom condition wemust take w as an increasing function of N (as is done in [10]), which would prevent us fromobtaining the uniform lower bound in Theorem 5.3.

We have constructed a (weakly) pseudorandom majorant for our prime counting functionΛ, and can apply the Relative Szemeredi theorem to finally complete the proof of the Green-Tao theorem.

5.3 The Green-Tao Theorem

We now have all that we need to prove the main result, that the primes contain infinitelymany arbitrarily long arithmetic progressions. In fact, we can deduce a stronger result,giving an explicit lower bound. The proof below follows the outline given in [10], making itexplicit how to obtain the lower bound they mention as a remark.

Theorem 5.3 (The Green-Tao Theorem). For any k ≥ 3 there exists some constant c′k > 0depending only on k such that, for N sufficiently large,

P (k,N) ≥ c′kN2

logkN.

34

Proof. Choose M such that N = d2εkWM + 1e. By Lemma 5.1, for sufficiently large N ,

En∈ZM (Λ) ≥ εk4λk

.

Furthermore, by Lemma 5.2 and Theorem 5.2, 0 ≤ Λ ≤ ν where ν is weakly k-pseudorandom.Hence by Theorem 4.5, the alternative Relative Szemeredi Theorem, there exists some con-stant ck > 0 such that, for sufficiently large M ,

Υk(Λ) ≥ ck2.

If we define 1P (n) = 1 if Wn+ 1 is prime and n ∈ [εkM, 2εkM ] and 0 otherwise, then

Λ(n) =φ(W )

2λkWlog(Wn+ 1)1P (n) ≤ φ(W )

2λkWlogN1P (n).

Let a, a+b, . . . , a+(k−1)b be an arithmetic progression of such n. Since a+ib ∈ [εkM, 2εkM ]for all 0 ≤ i ≤ k− 1, and since we may take εk < 1/k, this will be an arithmetic progressionin [1,M ], not just ZM . Furthermore, Wa + 1,Wa + 1 + Wb, . . . ,Wa + 1 + (k − 1)Wb willbe an arithmetic progression of primes in [1, N ], thanks to our initial choice of M .

Furthermore, since the degenerate case b = 0 can contribute at most 1M

to the expectation,we see that, for N sufficiently large,

P (k,N) ≥ M2

(Υk(1P )− 1

M

)≥ M2

(2λkW

φ(W ) logN

)kck4

≥

(N

4εkW

)2

logkN

(4W

φ(W )

)kck4

≥ c′kN2

logkN

for some constant c′k > 0, as long as we take W sufficiently large depending on k.

Corollary 5.1. For any k, the primes contain infinitely many k-term arithmetic progres-sions.

It is crucial in the above proof that W was not dependent on N . If we take W as someincreasing function of N , as Green and Tao do, then although it makes certain parts of theargument simpler (we do not need to appeal to weak pseudorandomness), we cannot geta lower bound of the form we have stated. It is, however, strong enough to deduce thiscorollary.

Chapter 6

Further Results

6.1 Extensions of the Green-Tao Theorem

Although the conclusion Theorem 5.3 was stated for primes, we may in fact use an almostidentical argument for subsets with positive density within the primes (since if A ⊆ B withpositive density and B ⊆ C with positive density then A ⊆ C with positive density). Inparticular, since every prime congruent to 1 modulo 4 is the sum of two squares, Green andTao obtain the following previously unknown result:

Theorem 6.1. There are arbitrarily long arithmetic progressions where every term is thesum of two squares.

A natural question to ask is whether the same methods can prove the stronger result thatthe primes contain arbitrarily long polynomial progressions. This was shown by Tao andZiegler by similar methods in 2008, appealing this time to a Polynomial Szemeredi theoremproven by Bergelson and Leibman in 1996. They show the following:

Theorem 6.2 (Tao-Ziegler [22]). Given any k polynomials with integer coefficients P1, . . . , Pksuch that P1(0) = · · · = Pk(0) = 0, the primes contain infinitely many progressions of theform

x+ P1(m), . . . , x+ Pk(m) with m > 0.

Generalising the other way, we can talk about prime elements in any ring, so it is naturalto ask whether Green-Tao results can be obtained in these alternative settings. Tao hasextended the method to deal with the Gaussian primes (those in the ring Z[i]):

Theorem 6.3 (Tao [21]). The Gaussian primes contain infinitely many instances of any‘constellation’, that is, sets of the form

a+ v0b, . . . , a+ vk−1b all prime, with a ∈ Z[i] and b ∈ Z

for any fixed k and Gaussian integers v0, . . . , vk−1.

Hoang Le has also proven a version for the ring of polynomials over a finite field:

35

36

Theorem 6.4 (Hoang Le [16]). Let Fq be the finite field with q elements. Then for any kwe may find polynomials f, g ∈ Fq[t] with g 6= 0 such that for any polynomial P ∈ Fq[t] withdegree less than k, the polynomial

f + Pg

is irreducible.

For all these extensions, the previous remarks apply so that they are also valid for sets ofpositive density within the primes.

6.2 Asymptotics for P (k,N)

Recent advances by Green, Tao and Ziegler appear to have established the conjecturedasymptotic for all k. We will now briefly outline the status of these results.

In [7] they conditionally established the Hardy-Littlewood prime tuples conjecture, Con-jecture 1.2, for linear forms which are not rational multiples of each other. This result wasconditional on two partially resolved conjectures: the inverse Gowers-norm conjecture, GI(s),and the Mobius-nilsequence conjecture, MN(s). In particular, they prove the following

Theorem 6.5 (Green-Tao, conditional). If the MN(k − 2) and GI(k − 2) conjectures hold,then

P (k,N) ∼ CkN2

logkN

where the constant is defined by

Ck :=∏p<k

pk−2

(p− 1)k−1

∏p≥k

pk−2(p− k + 1)

(p− 1)k−1.

The GI(k− 2) conjecture roughly says that if a bounded function has large Gowers Uk−1

norm then it correlates with a structured type of function known as a (k−2)-step nilsequence.Hence if a function is not Gowers uniform then we can deduce a lot about its structure.

The MN(k − 2) conjecture roughly says that (k − 2)-step nilsequence obtained from thisdoes not correlate well with the Mobius function µ.

Putting these together, the idea behind the proof (ignoring significant technical difficul-ties) is similar to the one used in the Green-Tao theorem: we measure the primes using somevariant on the von Mangoldt function, call it Λ. If the Gowers norm of Λ−1 is large, then bythe inverse Gowers-norm conjecture it correlates well with a nilsequence. Due to the closeconnection between the Mobius function and Λ, however this leads to a contradiction tothe Mobius-nilsequence conjecture. Hence we can decompose Λ into a bounded part and auniform part, show that the uniform part is negligible and use some Szemeredi-type theoremto give an asymptotic for the bounded part.

Hence to give asymptotics for arithmetic progressions within the primes it suffices toprove these conjectures. GI(1) and MN(1) are classical and can be deduced from the circlemethods used by Hardy, Littlewood and Vinogradov. Green and Tao have proven GI(2) in[9] and MN(2) in [11], giving the following unconditional asymptotics for the cases k = 3, 4.

CHAPTER 6. FURTHER RESULTS 37

Theorem 6.6 (Green-Tao).

P (3, N) ∼ C3N2

log3N, and

P (4, N) ∼ C4N2

log4N;

where the constants are defined by

C3 := 2∏p≥3

p(p− 2)

(p− 1)2≈ 1.3203, and

C4 :=3

4

∏p≥5

p2(p− 3)

(p− 1)3≈ 0.4764.

The Mobius-nilsequence conjecture was established in all cases in [8], hence Theorem 6.5is only dependent on the inverse Gowers-norm conjecture. Green, Tao and Ziegler havefurther established the GN(3) conjecture in [12], so we have the following.

Theorem 6.7 (Green-Tao-Ziegler).

P (5, N) ∼ C5N2

log5N,

where the constant is

C5 :=27

16

∏p≥5

p3(p− 4)

(p− 1)4≈ 0.5189.

Recently, Green, Tao and Ziegler have extended the strategy employed in [12] to all casesin [13], finally removing the conditional status of Theorem 6.5 and giving an asymptotic forprime progressions of any length.

6.3 Explicit Bounds

We finally mention the related question of where the first k-term prime progression canbe found. Let [1, Nk] be the smallest interval containing a k-term prime progression, say{a, a+ b, . . . , a+ (k − 1)b}.

Let us first consider lower bounds for Nk. It is easily verified that a ≥ k and that everyprime less than k must divide b, which gives the lower bound

Nk > k + (k − 1)∏p<k

p.

It follows from the prime number theorem that∏

p<n p = e(1+o(1))n, which gives the asymp-totic lower bound

Nk > e(1+ok→∞(1))k.

38

Now we turn to the harder problem of an upper bound for Nk. It is possible to computeone by keeping careful track of the constants and rates of decay in the o(1) errors in theproof of the Green-Tao theorem above. Since the proof invokes Szemeredi’s theorem, thebounds obtained are heavily dependent on the optimal bound for Szemeredi’s theorem. Thebest obtained so far is from Gowers’ proof using Fourier analysis in [4]. For a set of densityδ this proof gives an upper bound for the first occurrence of a k-term arithmetic progressionof:

22δ−22

k+9

.

To calculate the best possible bound for the Green-Tao theorem from the given proof seemsa herculean task. Green and Tao have, however, in a brief note [6] estimated the constantsand errors in their original proof, obtaining the following (non-optimal) upper bound, wherec is some absolute constant:

Nk < c222222

2100k

.

Clearly, the gap between the lower and upper bounds is quite large. It has been conjecturedthat there exist k-term prime progressions with step equal to (the smallest possible)

∏p≤k p

for all k, and so the asymptotic lower bound gives the correct order of Nk. This has beenverified up to k = 21 by computer.

Appendix A

Proof of the Decomposition Theorem

In this appendix we give a proof of Theorem 4.3 and how Theorem 4.1 follows from this.The ideas here are those in [3], although we present the argument in a different way andtake advantage of the narrowness of our goal to make some simplifications.

First, we make the following definition.

Definition A.1 (Basic anti-uniform functions). Let ν : ZN → R be simply pseudorandom.A function g : ZN → R is a basic anti-uniform function if g = Df for some 0 ≤ f ≤ ν.

The following two lemmas are the only parts where we use the fact that ν is simplypseudorandom. They give us bounds on anti-uniformity which we shall use to construct theanti-uniform ψ in Theorem 4.3.

Lemma A.1 (Basic anti-uniform functions are bounded). If 0 ≤ f ≤ ν for some simplypseudorandom ν then ‖Df‖L∞ = Ok(1).

Sketch proof. We in fact show that ‖Df‖∞ ≤ 22k−1+ o(1), by writing out the definition of

Df and applying the simple pseudorandomness condition.

Lemma A.2 (Basic anti-uniform products are anti-uniform). Let 0 ≤ f1, . . . , fm ≤ ν forsome simply pseudorandom measure ν. Then

‖Df1 · · · Dfm‖∗Uk−1 = Om(1).

Remark A.1. This is the part of the proof which requires the bulk of the simple pseudo-randomness condition. If a way to avoid this lemma were found, then the Decompositiontheorem could be proven with only the hypothesis that ‖ν − 1‖Ud is sufficiently small forsome large d.

Sketch proof. Recalling the definition of the dual norm, we need to show that for all f ∈ RN

such that ‖f‖Uk−1 ≤ 1, we have the bound

〈f,m∏j=1

Dfj〉 = Om(1).

39

40

After applications of the Cauchy-Schwarz and Holder inequalities, together with the factthat f ≤ ν, it is sufficient to show that

E

E

∏ω∈Ck−1

ν(y + ω · h) : y ∈ ZN

m

: h ∈ Zk−1N

= Om(1).

Using the simply pseudorandom condition and the triangle inequality, we may bound thisexpression by E(τm) = Om(1) for some weight function τ , and we are done.

The idea now is to construct ψ, our anti-uniform approximation to φ+, as follows. Weshow that φ is a small linear combination of basic anti-uniform functions, and then use thisfact to construct a polynomial in φ which is anti-uniform and approximates φ+. For the firsttask, we define the following norm.

Definition A.2. We define the basic norm by

‖φ‖B := inf

{n∑i=1

|λi| : φ =n∑i=1

λiDfi, 0 ≤ f1, . . . , fn ≤ ν

},

and if ‖φ‖B ≤ η then we say that φ is η-basic.

Remark A.2. It is a simple exercise to verify that the expression above does in fact givea norm on RN . In this definition, and for the rest of this proof, ν will be a fixed simplypseudorandom function.

The ‖·‖B norm measures how well we can approximate φ by a linear combination of basicanti-uniform functions, which are well-behaved by the above lemmas. It is easy to deducethe following analogues for Lemma A.1 and Lemma A.2.

Lemma A.3 (Basic functions are bounded). If φ is η-basic then ‖φ‖∞ = O(η).

Lemma A.4 (Basic powers are anti-uniform). If φ is η-basic then, for any integer m,‖φm‖∗

Uk−1 = Om(ηm).

The following lemma constructs an anti-uniform approximation to φ, provided that it issufficiently basic.

Lemma A.5 (Approximation with an anti-uniform polynomial). There exists a polynomialP such that if 1 > η > 0, then for every η-basic function φ,

1. ‖Pφ− φ+‖∞ ≤ 18

2. ‖Pφ‖∗Uk−1 ≤ A

for some A dependent only on η.

APPENDIX A. PROOF OF THE DECOMPOSITION THEOREM 41

Proof. By Lemma A.3 we know that for every η-basic φ, we have ‖φ‖∞ ≤ Cη < C for someC independent of φ and η. Now choose some polynomial P such that |P (x) − x+| ≤ 1

8on

[−C,C], and hence ‖Pφ− φ+‖∞ ≤ 18. Note that P is independent of both φ and η.

Furthermore, by Lemma A.4, for any integer m and any η-basic φ, we have ‖φm‖∗Uk−1 =

Om(ηm) . If we denote the polynomial P by anxn + · · ·+ a0 then by the triangle inequality

‖Pφ‖∗Uk−1 ≤ |an|‖φn‖∗Uk−1 + · · ·+ |a0|= O(|an|ηn + · · ·+ |a0|)

and we denote this last quantity by A, noting that it is dependent only on η.

Finally, we show that condition (3) implies that φ is sufficiently basic to be able toconstruct such an anti-uniform approximation.

Lemma A.6 (Lack of correlation with uniform functions implies basic). If 〈h, φ〉 ≤ 1 for

every η-uniform 0 ≤ h ≤ ν then φ is η−2k−1-basic.

Sketch proof. Use the fact that ‖Dh‖B ≤ 1 and Lemma 3.1 to deduce that if ‖h‖∗B ≤ η2k−1

then h is η-uniform. The result follows using the fact that the dual of a dual norm is theoriginal norm.

Putting these together, we get the following precise form of Theorem 4.3.

Theorem A.1. If 〈h, φ〉 ≤ 1 for every η-uniform function h then there exists a polynomialP (x) such that ‖Pφ− φ+‖ ≤ 1

8and ‖Pφ‖∗

Uk−1 ≤ A for some A dependent only on η.

Proof. Combine Lemmas A.6 and A.5.

We can now complete our original strategy for proving Theorem 4.1, as shown below.

Proof of Theorem 4.1. Suppose no such decomposition exists. Then by the Hahn-Banachtheorem there exists a φ such that

1. 〈f, φ〉 > 1,

2. 〈g, φ〉 ≤ 1 for every g such that 0 ≤ g ≤ 2, and

3. 〈h, φ〉 ≤ 1 for every η-uniform h.

Condition (2) implies that 〈1, φ+〉 ≤ 12, and by Theorem A.1, condition (3) implies there

exists a polynomial P such that ‖Pφ−φ+‖∞ ≤ 18

and ‖Pφ‖∗Uk−1 ≤ A for some A dependent

only on η. Using this, we may obtain the bound

〈ν, φ+〉 = 〈1, φ+〉+ 〈1, Pφ− φ+〉+ 〈ν − 1, Pφ〉+ 〈ν, φ+ − Pφ〉

≤ 1

2+

1

8+ A‖ν − 1‖Uk−1 +

1

8(1 + o(1))

=3

4+ o(1)

42

since, by Lemma 4.1, ‖ν− 1‖Uk−1 = o(1). We have also used the fact that η, and hence A, isfixed for the duration of this proof. Since f ≤ ν, and f and φ+ are both strictly positive, wecan deduce that 〈f, φ+〉 ≤ 〈ν, φ+〉. Appealing to condition (1) above, we have the followinginequalities:

1 < 〈f, φ〉 ≤ 〈f, φ+〉 ≤ 〈ν, φ+〉 ≤3

4+ o(1).

Hence have a contradiction for N sufficiently large, and so the required decomposition mustexist.

Appendix B

Estimates for ΛR

In this appendix we outline the number theoretical arguments needed to show that thefunction ν constructed in Chapter 5 is pseudorandom. The proofs given here are a synthesisof those in Section 10 of [10] and Appendix D of [7].

Theorem B.1 (Linear Forms Estimate). Suppose we have m linear forms in t variablesx = (x1, . . . , xt), each of the form ψi(x) =

∑tj=1 Lijxj + bi with integer coefficients bounded

by |Lij| ≤(w2

)1/4and with the t-tuples (Lij)

tj=1 never identically zero and none a rational

multiple of another. Let B ⊂ Nt be a product of t intervals, each of length at least R10m.Define the modified linear forms θi(x) = Wψi(x) + 1. Then

E(ΛR(θ1(x))2 · · ·ΛR(θm(x))2 : x ∈ B) = eOm( 1w)(1 + o(1))

(W

φ(W ) logR

)m.

Theorem B.2 (Correlation Estimate). Suppose we have m linear forms of the form ψi(x) =x+ bi for distinct |bi| ≤ N2, and let θi and B be as above. Then

E(ΛR(θ1(x))2 · · ·ΛR(θm(x))2 : x ∈ B) = eOm( 1w))(1+o(1))

(W

φ(W ) logR

)m∏p|∆

(1+O(p−1/2))

where ∆ denotes the integer

∆ =∏

1≤i<j≤m

|hi − hj|.

Remark B.1. Note that the linear forms in the second theorem are not covered by the first,since all the (Lij) are identically {1}. We will combine the proof for both below, since theyare identical except for the different evaluation of the Euler product

∏p Fp below. These are

included in separate sections below.

Sketch proof. For both theorems we must estimate the same expectation, so let us denotethis by Eψ,R. Expanding out the definitions we get

Eψ,R =∑

a,b∈Nm

(m∏i=1

µ(ai)µ(bi)χ

(log ailogR

)χ

(log bilogR

))E

(m∏i=1

1ai,bi|θi(x) : x ∈ B

)

43

44

where we write a = (a1, . . . , am) and b = (b1, . . . , bm). Since χ(x) = 0 for x ≥ 1 we haveremoved the restriction that a, b ≤ R. Furthermore, if we let D be the least common multipleof a1, . . . , bm, then we may replace the presence of B in the expectation above with Zt

D, withonly (assuming λk in the definition of R is sufficiently small) a o(1) error, which can beincluded in the right hand side.

We shall denote the expectation factor as ωa,b, and expand it as an Euler product usingthe Chinese Remainder Theorem:1

ωa,b := E

(m∏i=1

1ai,bi|θi(x) : x ∈ ZtD

)=∏p

E

∏j such thatp|aj or p|bj

1p|θj(x) : x ∈ Ztp

=∏p

ωa,b(p).

Note that this is the only factor influenced by the choice of linear forms. Hence we can write

Eψ,R =∑

a,b∈Nm

(m∏i=1

µ(ai)µ(bi)χ

(log ailogR

)χ

(log bilogR

))ωa,b.

Let ψ be the inverse Fourier transform of exχ(x), so that

χ(x) =

∫Rψ(t)e−ix(1+t)dt.

Since exχ(x) is smooth with compact support, ψ is also smooth and decays rapidly – forany A > 0, |ψ(t)| = OA((1 + t)−A).2 By restricting the range of integration to I :=

[− log1/2R, log1/2R], we see that (for any c ∈ R and A > 0)

χ

(log c

logR

)=

∫I

c−1+itlogRψ(t)dt+OA(c

−1logR log−AR).

To simplify notation, let t′ := 1+itlogR

. Since χ(

log clogR

)= O(c−1/ logR), the above gives us

m∏j=1

χ

(log ajlogR

)χ

(log bjlogR

)=

∫I

· · ·∫I

m∏j=1

ψ(xj)ψ(yj)

ax′jj b

y′jj

dxjdyj +OA(log−ARm∏j=1

(ajbj)−1/ logR).

It can be shown3 that the error term contributes OA(logO(1)−AR) to Eψ,R, which is o(1) forA large enough. Hence we have

Eψ,R =

∫I

· · ·∫I

∑a,b∈Nm

∏p

ωa,b(p)m∏j=1

µ(aj)µ(bj)

ax′jj b

y′jj

m∏j=1

ψ(xj)ψ(yj)dxjdyj + o(1).

1In the product we should properly have p|D, but this restriction can be dropped for the multiplicand is 1 otherwise.2See Appendix C for details.3See [7], p. 69

APPENDIX B. ESTIMATES FOR ΛR 45

By unique factorisation and the presence of µ, we may factor the first term as an Eulerproduct:

∑a,b∈Nm

∏ωa,b(p)

m∏j=1

µ(aj)µ(bj)

ax′jj b

y′jj

=∏p

∑a,b∈{1,p}m

ωa,b(p)m∏j=1

µ(aj)µ(bj)

ax′jj b

y′jj

:=∏p

Ep.

and hence it remains to evaluate

Eψ,R =

∫I

· · ·∫I

∏p

Ep

m∏j=1

ψ(xj)ψ(yj)dxjdyj + o(1).

Applying Lemma B.2, we get for some modified Euler factors Fp defined below

Eψ,R = (1 + o(1)) log−mR

∫I

· · ·∫I

∏p

Fp

m∏j=1

((1 + ixj)(1 + iyj)

2 + i(xj + yj)ψ(xj)ψ(yj)

)dxjdyj

= (1 + o(1))∏p

Fp log−mR

(∫I

∫I

(1 + ix)(1 + iy)

2 + i(x+ y)ψ(x)ψ(y)dxdy

)m(providing that we obtain an estimate for

∏p

Fp independent of x′j, y′j)

= (1 + o(1))∏p

Fp log−mR

(∫R

∫R

(1 + ix)(1 + iy)

2 + i(x+ y)ψ(x)ψ(y)dxdy + o(1)

)m= (1 + o(1))

∏p

Fp log−mR.

We could replace the limits of integration by R at the cost of o(1) factors thanks to the rapidconvergence of ψ, and the final equalities are a consequence of Lemma B.1. Only now doesthe proof for our two theorems diverge, in the estimation of

∏p Fp. I have divided these up

into the sections below. Simply plug in Corollaries B.1 and B.2 respectively to obtain thetwo theorems.

Lemma B.1 (Sieve Factor Calculation).∫R

∫R

((1 + ix)(1 + iy)

2 + i(x+ y)ψ(x)ψ(y)

)dxdy =

∫ ∞0

χ′(t)dt = 1.

Proof sketch. For the first equality, evaluate the integral using the observation that

1

2 + i(x+ y)=

∫ ∞0

e−(1+ix)te−(1+iy)tdt

to separate the variables, and recalling that ψ(x) is the inverse Fourier transform of exχ(x).The second equality follows from our original choice of χ.

46

It remains to give estimates for the Euler factors. Note that by explicitly considering allthe possible a, b ∈ {1, p}m we can rewrite the Euler factors Ep in the more convenient form

Ep =∑

a,b∈{1,p}mωa,b(p)

m∏j=1

µ(aj)µ(bj)

ax′jj b

y′jj

=∑

I,J⊆[m]

(−1)|I|+|J |ωI∪J(p)

pPj∈I x

′j+Pj∈J y

′j

whereωX(p) := E(

∏j∈X

1p|θj(x) | x ∈ Ztp).

It will also be convenient to define the altered Euler factors

E ′p :=m∏j=1

(p1+x′j − 1)(p1+y′j − 1)

p(p1+x′j+y′j − 1)

.

and then, as promised, we define Fp := EpE′p

. These factors are much easier to evaluate than

Ep directly, and the following lemma allows us to pass between them in the proof above.

Lemma B.2 (Euler Product Evaluation).∏p

Ep =∏p

Fp1 + o(1)

logmR

m∏j=1

((1 + ixj)(1 + iyj)

2 + i(xj + yj)

).

Proof sketch. This is a simple consequence of the definition of E ′p and Lemma B.3 below.

Note that since, for instance, xj ∈ [− log1/2R, log1/2R] and x′j :=1+ixjlogR

we have that 1+x′j =

1 + o(1) and so the lemma is applicable.

Lemma B.3 (Zeta Function Estimate). When <(s) > 1 and s = 1 + o(1),∏p

(1− 1

ps

)= (1 + o(1))(s− 1).

B.1 Euler Product for independent linear forms

All that remains is to provide a suitable estimate for∏

p Fp. The strategy in this sectionand the next is the same, and runs as follows. We first obtain bounds on the local factorestimates ωX(p) for p ≥ w (it is here that the W -trick is applied). Secondly, we use thesebounds to estimate Ep in terms of E ′p for each prime p ≥ w. Finally, we use this to estimate∏

pEp in terms of∏

pE′p, and since Fp = Ep/E

′p, these give us an estimate for

∏p Fp.

Lemma B.4 (Local Factor Estimate). For p ≥ w

ω∅(p) = 1

ωX(p) =1

pwhenever |X| = 1


ωX(p) ≤ 1

p2whenever |X| ≥ 2

Proof. When X = ∅ we are simply taking the expectation of the empty product, which is 1.When |X| = 1 then, for some j,

ωX(p) = E(1p|θj(x) | x ∈ Ztp) =

#{x ∈ Ztp : θj(x) ≡ 0 (mod p)}

pt=

1

p

since θj : Ztp → Zp is a uniform covering. For the final claim, note that it suffices to prove it

for the case |X| = 2, so let us suppose X = {j, k}. Write the linear forms as

ψj(x) =t∑i=1

aibixi + lj and ψk(x) =

t∑i=0

cidixi + lk.

Suppose that the pure linear forms W (ψj − bj) and W (ψk − bk) are multiplies of eachother mod p, so that for some λ and every 1 ≤ i ≤ t we have

Waibi≡ λW

cidi

(mod p).

Since p - W , we may rearrange this to give

a1d1

b1c1

≡ a2d2

b2c2

≡ · · · ≡ atdtbtct

(mod p)

and hence, for any 1 ≤ i ≤ t,

a1d1bici ≡ b1c1aidi (mod p).

In other words,

p| |a1d1bici − b1c1aidi| ≤ |a1d1bici|+ |b1c1aidi| < w ≤ p

where for the inequalities we have used the bounds |ai|, |bi|, |ci|, |di| <(w2

)1/4. Hence we have

equality not only in Zp but also in Z, and so

aibi

=

(a1d1

b1c1

)cidi.

This contradicts our hypothesis that the pure linear forms are not rational multiples of oneanother. Hence the pure linear forms are also independent over Zp.

Let Z be the set of solutions to θj(x) ≡ θk(x) ≡ 0 (mod p). Since W (ψj − lj) andW (ψk − lk) are not multiples of each other modulo p, it follows that Z is contained in theintersection of two skew affine subspaces of Zt

p, and hence has cardinality at most pt−2. By

definition, ωX(p) = |Z|pt

, and we are done.

48

Lemma B.5 (Euler Factor Estimate). For p ≥ w

Ep =

(1 +Om

(1

p2

))E ′p

Proof. We divide up the sum in the definition of Ep into the cases I = J = ∅, |I| ∪ |J | = 1and |I| ∪ |J | ≥ 2 and apply Lemma B.8 to get

Ep :=∑

I,J⊆[m]

(−1)|I|+|J |ωI∪J(p)

pPj∈I x

′j+Pj∈J y

′j

= ω∅(p)−m∑j=1

(1

p1+x′j+

1

p1+y′j− 1

p1+x′j+y′j

)+

∑I,J⊆[m]|I|∪|J |≥2

Om(1/p2)

pPj∈I x

′j+Pj∈J y

′j

= 1−m∑j=1

(px′j + py

′j − 1

p1+x′j+y′j

)+Om(1/p2).

Hence we need to show that

EpE ′p

=

1−∑m

j=1

(px′j+p

y′j−1

p1+x′

j+y′j

)∏m

j=1(p

1+x′j−1)(p

1+y′j−1)

p(p1+x′

j+y′j−1)

+Om

(1

p2

)= 1 +Om

(1

p2

),

which follows from Taylor expansion.

Lemma B.6. ∏p≥w

(1 +Om

(1

p2

))= eOm( 1

w).

Proof. We use the inequality |1 + a| ≤ ea to see that∏p≥w

∣∣∣∣1 +Om

(1

p2

)∣∣∣∣ ≤ eOm

“Pp≥w

1p2

”≤ eOm(

Pn≥w

1n2 ).

Since x−2 is a decreasing function, we may also bound the sum above by an integral, to get∑n≥w

1

n2≤ 1

w+

∫ ∞w

1

x2dx =

2

w.

Combining these two inequalities gives us the result.

Lemma B.7 (Euler Product Simplification).∏p

Ep = eOm( 1w)((

W

φ(W )

)m+ o(1)

)∏p

E ′p


Proof. We divide the product into two parts, p < w and p ≥ w, and evaluate each separately.

Applying first Lemma B.5 and then Lemma B.6, we get∏p≥w

Ep =∏p≥w

(1 +Om

(1

p2

))∏p≥w

E ′p = eOm( 1w)∏p≥w

E ′p.

Since Ep = 1 for p < w, so in particular∏

p<w Ep = 1, it remains to show that(W

φ(W )

)m+ o(1) =

∏p<w

E ′−1p .

Noticing that (since W =∏

p<w p and φ is multiplicative)

W

φ(W )=∏p<w

p

φ(p)=∏p<w

p

p− 1,

it suffices in turn to show that for all p < w(p

p− 1

)m+ o(1) = E ′−1

p .

After some algebraic manipulation, we see that

E ′−1p = pm

m∏j=1

p1+x′j+y′j − 1

(p1+x′j − 1)(p1+y′j − 1).

Furthermore, since x′j :=1+ixjlogR

where xj ∈ R, px′j = 1 + o(1) and similarly for the other

two cases. Hence we get

E ′−1p = pm

m∏j=1

(1 + o(1))p− 1

((1 + o(1))p− 1)2

= pm(p− 1 + o(1))−m =

(p

p− 1

)m+ o(1)

which concludes the proof.

Corollary B.1 (Euler Product Estimate).

∏p

Fp = eOm( 1w)(1 + o(1))

(W

φ(W )

)m.

50

B.2 Euler product for simple linear forms

Lemma B.8 (Local Factor Estimate). For p ≥ w

ωX(p) =1

pwhenever |X| ≥ 2 and p|∆,

ωX(p) = 0 whenever |X| ≥ 2 and p - ∆,

ωX(p) =1

pwhenever |X| = 1, and

ω∅(p) = 1.

Proof sketch. The final two claims proceed exactly as in the previous section. For the firsttwo claims, note that if |X| ≥ 2 then ωX(p) is equal to 1/p if all the residue classes hi(mod p) are equal, and 0 otherwise.

Lemma B.9 (Euler Factor Estimate).

Ep =

(1 +O

(1

p2

))E ′p whenever p - ∆, and

Ep =

(1 +O

(1√p

))E ′p whenever p | ∆.

Proof. For the first claim, we apply Lemma B.8 and argue as in the proof of Lemma B.5.For the second, note that if p|∆ then, using the definition of Ep and Lemma B.8,

Ep = 1 +1

p

∑I,J⊆[m]I∪J 6=∅

(−1)|I|+|J |

pPj∈I x

′j+Pj∈J y

′j

= 1− 1

p+

1

p

∑I⊆[m]

(−1)|I|

pPj∈I x

′j

∑J⊆[m]

(−1)|J |

pPj∈J y

′j

= 1− 1

p+

1

p

m∏j=1

(1− 1

px′j

)(1− 1

py′j

)= 1 +O

(1√p

).

Similarly, one can show that E ′p = 1 +O(

1√p

), and we are done.

Lemma B.10 (Euler Product Simplification).∏p

Ep = eOm( 1w)((

W

φ(W )

)m+ o(1)

)∏p|∆

(1 +O(p−1/2))∏p

E ′p.


Proof. As in the proof of Lemma B.7, we know that∏p<w

Ep =

((W

φ(W )

)m+ o(1)

)∏p<w

E ′p

and also that ∏p≥wp-∆

Ep = eOm( 1w)∏p≥wp-∆

E ′p.

An application of the above Euler factor estimate gives∏p≥wp|∆

Ep =∏p|∆

(1 +O(p−1/2))∏p≥wp|∆

E ′p

and combining these three products gives us the required result.

Corollary B.2 (Euler Product Estimate).∏p

Fp = eOm( 1w)(1 + o(1))

(W

φ(W )

)m∏p|∆

(1 +O(p−1/2)).

B.3 Pseudorandomness of ν

This section proves the linear and simple pseudorandomness conditions. The arguments inthis section are exactly those in [10], Section 9, and are included here for completeness.

Theorem B.3 (Weak Linear Pseudorandomness Condition). If we have m ≤ k · 2k−1 ho-mogenous linear forms ψi in t ≤ 3k − 4 variables with rational coefficients bounded by k inboth numerator and denominator, and none equal to zero or a rational multiple of another,then

E(ν(ψ1(x)) . . . ν(ψm(x)) : x ∈ ZtN) = 1 + εm + o(1),

where |εm| ≤ εk for some εk depending only on k.

Proof. First we clear denominators and assume that all the linear forms have integer coeffi-cients, at the cost of increasing the bound on the coefficients to (k+1)!. Taking w sufficientlylarge, we can assume that (k + 1)! <

√w2

and so we can apply Theorem B.1 to these linearforms.

We must first chop up the range of summation into boxes before we can apply Theo-rem B.1, to deal with the two-part definition of ν. Let Q =

√N , and divide Zt

p into Qt

roughly equal sized boxes, Bu1,...,ut = Bu.

Bu = {x ∈ ZtN : xj ∈ [bujQc, b(uj + 1)Qc)}

52

Call u ∈ ZtQ nice if every linear form takes the box Bu entirely inside or outside of the

interval [εkN, 2εkN ]. Note that by definition of Q and the upper bound on m, N/Q > R5m,so we may apply Theorem B.1 to obtain

E(ν(ψ1(x)) · · · ν(ψm(x)) | x ∈ Bu1,...,ut) = eOm( 1w)(1 + o(1))

since we can replace each factor by either 1 or φ(W ) logRW

Λ2R(θi(x)).

So the nice boxes have already been dealt with, and give us the answer we’re looking for.Next we show that most boxes are nice – more precisely, that the proportion of non-niceboxes is at most O(1/Q).

Suppose u is not nice; then there exists some linear form ψ and x, y ∈ Bu such thatψ(x) ∈ [εkN, 2εkN ] but ψ(y) 6∈ [εkN, 2εkN ]. Suppose ψ(x) =

∑tj=1 Ljxj + b. Then

ψ(x), ψ(y) =t∑

j=1

LjbQujc+ b+O(Q).

Hence

Either εkN or 2εkN =t∑

j=1

LjbQujc+ b+O(Q),

and sot∑

j=1

Ljuj = εkQ+b

Q+O(1)(modQ).

But since (Lj) is non-zero, the number of t-tuples u which satisfy this is at most O(Qt−1), andhence the proportion of non-nice boxes with respect to ψ is O(1/Q). But since the numberof linear forms is bounded also, the total proportion of non-nice boxes is also O(1/Q).

When u is not nice, we can bound ν by the trivial bound 1 + φ(W ) logRW

Λ2R(θi(x)). Multi-

plying out and applying Theorem B.1 again, we get

E(ν(ψ1(x)) · · · ν(ψm(x)) | x ∈ Bu) = eOm( 1w)(O(1) + o(1)).

Putting it all together

LHS = E(E(ν(ψ1(x)) · · · ν(ψm(x)) | x ∈ Bu) | u ∈ ZtQ) + o(1)

= eOm( 1w)(1 +O(1/Q) + o(1)) = 1 + εm + o(1)

where εm can be taken sufficiently small by taking w sufficiently large.

Theorem B.4 (Weak Simple Pseudorandomness Condition). Whenever we have m ≤ 2k−1

simple linear forms ψi in t ≤ k variables, then

E(ν(ψ1(x)) · · · ν(ψm(x)) = 1 + εm + o(1).


Furthermore, there exists a weight function τm : ZN → R+ such that E(τ q) = Om,q(1) for all1 ≤ q <∞ and for all h1, . . . , hm ∈ ZN we have the upper bound

Ex∈ZN (ν(x+ h1) · · · ν(x+ hm)) ≤ (1 + εm)∑

1≤i<j≤m

τ(hi − hj).

In both cases, |εm| ≤ εk for some εk sufficiently small depending on k.

Proof. The first part is proven exactly as for the previous theorem. For the second, weconstruct our weight function τ in the next lemma, with the additional requirement that

τ(0) := exp(Cm logN/ log logN).

Note this preserves the bounds E(τ q) = Om,q(1) for all q, since the weight at 0 contributesat most om,q(1).

First suppose at least two of the hi are equal. We may bound the left hand side crudelyby ‖ν‖m∞. Standard estimates give us

‖ν‖∞ � exp(C logN/ log logN)

and so‖ν‖m∞ � τ(0) ≤

∑1≤i<j≤m

τ(hi − hj),

which is the required bound.Now suppose that all hi are distinct. By Theorem B.2,

E(ν(x+ h1) · · · ν(x+ hm) : x ∈ ZN) = eOm( 1w)(1 + om(1))

∏p|∆

(1 +O(p−1/2))

≤ (1 + εm)(1 + om(1))∑

1≤i<j≤m

τ(hi − hj)

and by choosing w sufficiently we may ensure that εm is as small as required. Furthermore,by adjusting the function τ by a constant factor depending only on m (and hence k), we canabsorb the om(1) error into the sum, which gives the required result.

Lemma B.11 (Construction of the weight function). For any m ≥ 1 there is a weightfunction τm : Z→ R+ such that for all distinct h1, . . . , hm we have∏

p|∆

(1 +Om

(1√p

))≤

∑1≤i<j≤m

τ(hi − hj),

where∆ :=

∏1≤i<j≤m

|hi − hj|.

Furthermore, for any 0 < q <∞,

E(τ q(n) : 0 < |n| ≤ N) = Om,q(1).

54

Proof. We take τm(n) := Om(1)∏

p|n(1 + 1√p)Om(1) for all n 6= 0. We note that by the

arithmetic mean-geometric mean inequality,

∏p|∆

(1 +Om

(1√p

))≤

∏1≤i<j≤m

∏p|hi−hj

(1 +

1√p

)Om(1)

≤ Om(1)∑

1≤i<j≤m

∏p|hi−hj

(1 +

1√p

)Om(1)

=∑

1≤i<j≤m

τ(hi − hj).

Hence it remains to show that

E

∏p|n

(1 +

1√p

)Om(q)

: 0 < |n| ≤ N

= Om,q(1)

for all 0 < q < ∞. Since(

1 + 1√p

)Om(q)

is bounded by 1 + 1p1/4

for all but Om,q(1) many

primes p, we have

E

∏p|n

(1 +

1√p

)Om(q)

: 0 < |n| ≤ N

≤ Om,q(1)E

∏p|n

(1 +

1

p1/4

): 0 < n ≤ N

.

We now use the fact that∏

p|n

(1 + 1

p1/4

)≤∑

d|n1

d1/4to get that

E

∏p|n

(1 +

1√p

)Om(q)

: 0 < |n| ≤ N

≤ Om,q(1)1

2N

∑1≤|n|≤N

∑d|n

1

d1/4

≤ Om,q(1)1

2N

N∑d=1

N

d5/4

= Om,q(1)

and we are done.

Appendix C

Fourier transform

In this appendix we prove a standard fact about rapid decay of the Fourier transform requiredin the proof of Theorems B.1 and B.2. The proof here is taken from [25].

Lemma C.1. If f is a bounded function with compact support, then f is also bounded. Infact, we have

‖f‖∞ ≤ ‖f‖1

Proof. For any t ∈ R,

|f(t)| :=∣∣∣∣∫ f(x)e−ixtdx

∣∣∣∣ ≤ ∫ |f(x)|dx =: ‖f‖1 <∞.

Theorem C.1. Suppose f is CN with compact support and f (n) ∈ L1 for all 0 ≤ n ≤ N .Then

f (n)(t) = (it)nf(t)

when 0 ≤ n ≤ N and furthermore

|f(t)| = O((1 + |t|)−N)

Proof. We use induction on N . For N = 1, by integration by parts we have

f ′(t) =

∫f ′(x)e−ixtdx = it

∫e−ixtf(x)dx = itf(ξ)

The inductive step follows easily.

It follows from Lemma C.1 that f (n) is bounded, and hence the first part of the theoremimplies that tnf is bounded if n ≤ N , say |tnf | ≤ D. Note that we can take this boundto be uniform over all n, since there are only finitely many n to be considered. From thebinomial theorem it follows that for some constant C

C(1 + |t|)N ≤N∑n=0

|tn|

55

56

And hence

|f |N∑n=0

|tn| ≤ DN

so

|f | ≤ DN∑Nn=0|tn|

≤ DN

C(1 + |t|)N= O((1 + |t|)−N)

Corollary C.1. If f is smooth with compact support then f(t) = OA((1 + |t|)−A) for anyA > 0.

Appendix D

The GI and MN Conjectures

In this appendix we give a formal statement of the conjectures mentioned in Chapter Six.These statements are taken from [7], which contains an in-depth discussion of these conjec-tures and the results surrounding them.

Definition D.1 (Nilpotent). Let G be connected, simply connected, Lie group with centralseries G0 ⊇ G1 ⊇ G2 ⊇ . . . (that is, G0 = G1 = G and Gi+1 = [G,Gi] for i ≥ 2). We saythat G is s-step nilpotent if Gs+1 = 1.

Definition D.2 (Nilmanifold). Let G be an s-step nilpotent group, and Γ ⊆ G a discrete,cocompact subgroup. Then the quotient G/Γ is an s-step nilmanifold.

Definition D.3 (Nilsequence). An s-step nilsequence is a sequence of the form (F (gnx))n∈Nwhere g ∈ G, x ∈ G/Γ and F : G/Γ→ R is a continuous function for some s-step nilmanifoldG/Γ.

Conjecture D.1 (Inverse Gowers norm conjecture for s). Suppose that 0 < δ ≤ 1. Thenthere exists a finite collection Ms,δ of s-step nilmanifolds with the following property.

Given any N and 1-bounded function f on [N ] such that

‖f‖Us+1[N ] ≥ δ

there is a nilmanifold in Ms,δ and a 1-bounded s-step nilsequence (F (gnx)) on it with abounded Lipschitz constant (i.e. a bound dependent only on s and delta, not N) such that

|E[N ]f(n)F (gnx)| �s,δ 1

Let us see what this gives us in the case s = 1, i.e. for the Gowers U2 norm. A group is1-step nilpotent if and only if it is Abelian. In this case, one can in fact take G = R andΓ = Z andM1,δ is just the singleton set {R/Z} independent of δ. This case of the conjectureis easy to prove: if f has a large U2 norm then it is easy to show that it correlates with alinear character, i.e. a function of the form e

2πinN , and this is a 1-step nilsequence on R/Z

taking F to be the identity, x = 1 and g = e2πiN .

57

58

Conjecture D.2 (Mobius Nilsequence Conjecture). Let G/Γ be an s-step nilmanifold and(F (gnx)) a bounded s-step nilsequence. Then for any A > 0

|E[N ]µ(n)F (gnx)| � log−AN

where the implicit bound is dependent on A, s, the nilmanifold and the Lipschitz constant ofthe nilsequence (but not, importantly, on the nilsequence itself, not on g or x).

Bibliography

[1] P. Erdos and P. Turan, On some sequences of integers, J. London Math. Soc. 11 (1936),261–264.

[2] D. Goldston and C. Y. Yıldırım, Small gaps between primes, I, preprint available atarXiv:0504336.

[3] W. T. Gowers, Decompositions, Approximate Structure, Transference, and the Hahn-Banach theorem, preprint available at arXiv:0811.3103.

[4] , A new proof of Szemeredi’s theorem, GAFA 11 (2001), 465–588.

[5] Ben Green, Long arithmetic progressions of primes, Analytic Number Theory: a tributeto Gauss and Dirichlet (Tschnikel Duke, ed.), Clay Mathematics Proceedings, 2007,pp. 149–168.

[6] Ben Green and Terence Tao, A bound for progressions of length k in the primes, availableat http://www.math.ucla.edu/ tao/preprints/Expository/quantitative AP.dvi.

[7] , Linear equations in primes, to appear in Annals of Math., preprint available atarXiv:0606088.

[8] , The Mobius function is strongly orthogonal to nilsequences, preprint availableat arXiv:0807.1736.

[9] , An inverse theorem for the Gowers U3(G)-norm, with applications, Proc. Ed-inburgh Math. Soc. 51 (2008), no. 1, 71–153.

[10] , The primes contain arbitrarily long arithmetic progressions, Annals of Math.167 (2008), 481–547.

[11] , Quadratic uniformity of the Mobius function, Annales de l’Institut Fourier(Grenoble) 58 (2008), no. 6, 1863–1935.

[12] Ben Green, Terence Tao, and Tamar Ziegler, An inverse theorem for the Gowers U4-norm, submitted to the Glasg. Math. J., preprint available at arXiv:0911.5681.

[13] , An inverse theorem for the Gowers U s+1[N ]-norm, preprint available atarXiv:1009.3998.

59

60

[14] G. H. Hardy and J. E. Littlewood, Some problems of “partitio numerorum” III: On theexpression of a number as a sum of primes, Acta Math. 44 (1923), 1–70.

[15] D. R. Heath-Brown, Three primes and an almost-prime in arithmetic progression, J.London Math. Soc. 23 (1981), 396–414.

[16] Thai Hoang Le, Green-Tao theorem in function fields, preprint available atarXiv:0908.2642.

[17] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan, New proofsof the Green-Tao-Ziegler dense model theorem: An exposition, preprint available atarXiv:0806.0381.

[18] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 242–252.

[19] E. Szemeredi, On sets of integers containing no k elements in arithmetic progression,Acta Arith. (1975), 299–345.

[20] Terence Tao, A remark on Goldston-Yıldırım correlation estimates, available athttp://www.math.ucla.edu/ tao/preprints/Expository/gy-corr.dvi.

[21] , What is good mathematics?, Bull. Amer. Math. Soc. 44 (2007), 623–634.

[22] Terence Tao and Tamar Ziegler, The primes contain arbitrarily long polynomial progres-sions, Acta Math. 201 (2008), 213–305.

[23] J. G. van der Corput, Uber Summen von Primzahlen und Primzahlquadraten, Math.Ann. 116 (1939), 1–50.

[24] P. Varnavides, On certain sets of positive density, J. London Math. Soc. 34 (1959),358–360.

[25] Thomas Wolff, Lectures in harmonic analysis, available online athttp://www.math.ubc.ca/ ilaba/wolff/.

Documents

The Green-Tao Theorem on arithmetic progressions within ...matfb/notes/dissertation.pdf · Arithmetic Progressions. Encouraged by the success of the probabilistic model in counting