Structure and Randomness in Combinatorics

  • Upload
    feflos

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Structure and Randomness in Combinatorics

    1/14

    a r X i v : 0 7 0 7

    . 4 2 6 9 v 2 [ m a t h . C O

    ] 3 A u g 2 0 0 7

    Structure and randomness in combinatorics

    Terence TaoDepartment of Mathematics, UCLA

    405 Hilgard Ave, Los Angeles CA [email protected]

    Abstract

    Combinatorics, like computer science, often has to dealwith large objects of unspecied (or unusable) structure.One powerful way to deal with such an arbitrary object is todecompose it into more usable components. In particular,it has proven protable to decompose such objects into astructured component, a pseudo-random component, and asmall component (i.e. an error term); in many cases it is thestructured component which then dominates. We illustratethis philosophy in a number of model cases.

    1. Introduction

    In many situations in combinatorics, one has to deal withan object of large complexity or entropy - such as a graphon N vertices, a function on N points, etc., with N large.

    We are often interested in the worst-case behaviour of suchobjects; equivalently, we are interested in obtaining resultswhich apply to all objects in a certain class, as opposed toresults for almost all objects (in particular, random or aver-age case behaviour) or for very specially structured objects.The difculty here is that the spectrum of behaviour of anarbitrary large object can be very broad. At one extreme,one has very structured objects, such as complete bipartitegraphs, or functions with periodicity, linear or polynomialphases, or other algebraic structure. At the other extremeare pseudorandom objects, which mimic the behaviour of random objects in certain key statistics (e.g. their correla-tions with other objects, or with themselves, may be close

    to those expected of random objects).Fortunately, there is a fundamental phenomenon that

    one often has a dichotomy between structure and pseudo-randomness, in that given a reasonable notion of structure(or pseudorandomness), there often exists a dual notion of pseudorandomness (or structure) such that an arbitrary ob- ject can be decomposed into a structured component anda pseudorandom component (possibly with a small error).Here are two simple examples of such decompositions:

    (i) An orthogonal decomposition f = f str + f psd of avector f in a Hilbert space into its orthogonal pro- jection f str onto a subspace V (which represents thestructured objects), plus its orthogonal projectionf psd onto the orthogonal complement V of V (whichrepresents the pseudorandom objects).

    (ii) A thresholding f = f str + f psd of a vector f , wheref is expressed in terms of some basis v1 , . . . , vn (e.g.a Fourier basis) as f = 1 i n ci vi , the structuredcomponent f str := i :|c i | ci vi contains the contri-bution of the large coefcients, and the pseudoran-dom component f psd := i :|c i | 0is a thresholding parameter which one is at liberty tochoose.

    Indeed, many of the decompositions we discuss here canbe viewed as variants or perturbations of these two simple

    decompositions. More advanced examples of decomposi-tions include the Szemer edi regularity lemma for graphs(and hypergraphs), as well as various structure theorems re-lating to the Gowers uniformity norms, used for instance in[16] , [18] . Some decompositions from classical analysis,most notably the spectral decomposition of a self-adjointoperator into orthogonal subspaces associated with the purepoint, singular continuous, and absolutely continuous spec-trum, also have a similar spirit to the structure-randomnessdichtomy.

    The advantage of utilising such a decomposition is thatone can use different techniques to handle the structuredcomponent and the pseudorandom component (as well as

    the error component, if it is present). Broadly speaking,the structured component is often handled by algebraic orgeometric tools, or by reduction to a lower complexityproblem than the original problem, whilst the contributionof the pseudorandom and error components is shown to benegligible by using inequalities from analysis (which canrange from the humble Cauchy-Schwarz inequality to other,much more advanced, inequalities). A particularly notableuse of this type of decomposition occurs in the many dif-

    http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2http://arxiv.org/abs/0707.4269v2
  • 8/14/2019 Structure and Randomness in Combinatorics

    2/14

    ferent proofs of Szemer edis theorem [ 24] ; see e.g. [30 ] forfurther discussion.

    In order to make the above general strategy more con-crete, one of course needs to specify more precisely whatstructure and pseudorandomness means. There is nosingle such denition of these concepts, of course; it de-

    pends on the application. In some cases, it is obvious whatthe denition of one of these concepts is, but then one hasto do a non-trivial amount of work to describe the dual con-cept in some useful manner. We remark that computationalnotions of structure and randomness do seem to fall intothis framework, but thus far all the applications of this di-chotomy have focused on much simpler notions of structureand pseudorandomness, such as those associated to Reed-Muller codes.

    In these notes we give some illustrative examples of thisstructure-randomnessdichotomy. While these examples aresomewhat abstract and general in nature, they should by nomeans be viewed as the denitive expressions of this di-

    chotomy; in many applications one needs to modify the ba-sic arguments given here in a number of ways. On the otherhand, the core ideas in these arguments (such as a relianceon energy-increment or energy-decrement methods) appearto be fairly universal. The emphasis here will be on illus-trating the nuts-and-bolts of structure theorems; we leavethe discussion of the more advanced structure theorems andtheir applications to other papers.

    One major topic we will not be discussing here (thoughit is lurking underneath the surface) is the role of ergodictheory in all of these decompositions; we refer the readerto [30 ] for further discussion. Similarly, the recent ergodic-theoretic approaches to hypergraph regularity, removal, andproperty testing in [ 31], [3] will not be discussed here, in or-der to prevent the exposition from becoming too unfocused.The lecture notes here also have some intersection with theauthors earlier article [ 27].

    2. Structure and randomness in a Hilbert space

    Let us rst begin with a simple case, in which the objectsone is studying lies in some real nite-dimensional Hilbertspace H , and the concept of structure is captured by someknown set S of basic structured objects. This setting isalready strong enough to establish the Szemer edi regularity

    lemma, as well as variants such as Greens arithmetic reg-ularity lemma. One should think of the dimension of H asbeing extremely large; in particular, we do not want any of our quantitative estimates to depend on this dimension.

    More precisely, let us designate a nite collection S H of basic structured vectors of bounded length; we as-sume for concreteness that v H 1 for all v S . Wewould like to view elements of H which can be efcientlyrepresented as linear combinations of vectors in S as struc-

    tured , and vectors which have low correlation (or more pre-cisely, small inner product) to all vectors in S as pseudo-random . More precisely, given f H , we say that f is(M, K )-structured for some M,K > 0 if one has a decom-position

    f =1 i M

    ci vi

    with vi S and ci [K, K ] for all 1 i M . Wealso say that f is -pseudorandom for some > 0 if wehave | f, v H | for all v S . It is helpful to keep somemodel examples in mind:

    Example 2.1 (Fourier structure) . Let F n2 be a Hammingcube; we identify the nite eld F 2 with {0, 1}in the usualmanner. We let H be the 2n -dimensional space of functionsf : F n2 R , endowed with the inner product

    f, g H :=1

    2nx F n2

    f (x)g(x),

    and let S be the space of characters ,

    S := {e : F n2 },where for each F n2 , e is the function e (x) := ( 1)x .Informally, a structured function f is then one which canbe expressed in terms of a small number (e.g. O(1) ) char-acters, whereas a pseudorandom function f would be onewhose Fourier coefcients

    f () := f, e H (1)

    are all small.

    Example 2.2 (Reed-Muller structure) . Let H be as in theprevious example, and let 1 k n . We now letS = S k (F n2 ) be the space of Reed-Muller codes (1)P (x ) ,where P : F n2 F 2 is any polynomial of n variables withcoefcients and degree at most k. For k = 1 , this gives thesame notionsof structure and pseudorandomnessas the pre-vious example, but as we increase k, we enlarge the classof structured functions and shrink the class of pseudoran-dom functions. For instance, the function (x1 , . . . , x n ) (1)P 1 i

  • 8/14/2019 Structure and Randomness in Combinatorics

    3/14

    tensor products (v, w) 1A (v)1B (w), where A, B aresubsets of V . Observe that 1E will be quite structured if G is a complete bipartite graph, or the union of a boundednumber of such graphs. At the other extreme, if G is an-regular graph of some edge density 0 < < 1 for some0 < < 1, in the sense that the number of edges between

    A and B differs from |A||B | by at most |A||B | when-ever A, B V with |A|, |B | n , then 1E will beO()-pseudorandom.We are interested in obtaining quantative answers to the

    following general problem: given an arbitrary bounded el-ement f of the Hilbert space H (let us say f H 1 forconcreteness), can we obtain a decomposition

    f = f str + f psd + f err (2)

    where f str is a structured vector, f psd is a pseudorandomvector, and f err is some small error?

    One obvious qualitative decomposition arises from us-ing the vector space span( S ) spanned by the basic struc-tured vectors S . If we let f str be the orthogonal projectionfrom f to this vector space, and set f psd := f f str andf err := 0 , then we have perfect control on the pseudoran-dom and error components: f psd is 0-pseudorandom andf err has norm 0. On the other hand, the only control on f strwe have is the qualitative bound that it is (K, M )-structuredfor some nite K,M < . In the three examples givenabove, the vectors S in fact span all of H , and this decom-position is in fact trivial!

    We would thus like to perform a tradeoff, increasing ourcontrol of the structured component at the expense of wors-ening our control on the pseudorandom and error compo-

    nents. We can see how to achieve this by recalling howthe orthogonal projection of f to span( S ) is actually con-structed; it is the vector v in span( S ) which minimises theenergy f v 2H of the residual f v. The key point isthat if v span( S ) is such that f v has a non-zero innerproduct with a vector w S , then it is possible to move vin the direction w to decrease the energy f v 2H . We canmake this latter point more quantitative:Lemma 2.4 (Lack of pseudorandomness implies energydecrement) . Let H, S be as above. Let f H be a vec-tor with f 2H 1 , such that f is not -pseudorandom for some 0 < 1. Then there exists v S and c [

    1/, 1/ ] such that

    |f, v

    | and f

    cv 2H

    f 2H 2 .Proof. By hypothesis, we can nd v S be such that

    | f, v | , thus by Cauchy-Schwarz and hypothesis onS 1 v H |f, v | .

    We then set c := f, v / v 2H (i.e. cv is the orthogonalprojection of f to the span of v). The claim then followsfrom Pythagoras theorem.

    If we iterate this by a straightforward greedy algorithmargument we now obtain

    Corollary 2.5 (Non-orthogonal weak structure theorem) . Let H, S be as above. Let f H be such that f H 1 ,and let 0 < 1. Then there exists a decomposi-tion (2) such that f str is (1/ 2 , 1/ )-structured, f psd is - pseudorandom, and f err is zero.

    Proof. We perform the following algorithm.

    Step 0. Initialise f str := 0 , f err := 0 , and f psd := f .Observe that f psd 2H 1.

    Step 1. If f psd is -pseudorandom then STOP. Oth-erwise, by Lemma 2.4, we can nd v S and c[1/, 1/ ] such that f psd cv 2H f psd 2H 2 .

    Step 2. Replace f psd by f psd cv and replace f str byf str + cv. Now return to Step 1.It is clear that the energy f psd 2

    H decreases by at least

    2 with each iteration of this algorithm, and thus this al-gorithm terminates after at most 1/ 2 such iterations. Theclaim then follows.

    Corollary 2.5 is not very useful in applications, becausethe control on the structure of f str are relatively poor com-pared to the pseudorandomnessof f psd (or vice versa). Onecan do substantially better here, by allowing the error termf err to be non-zero. More precisely, we have

    Theorem 2.6 (Strong structure theorem) . Let H, S be asabove, let > 0 , and let F : Z + R + be an arbi-trary function. Let f H be such that f H 1. Thenwe can nd an integer M = OF, (1) and a decomposi-tion (2) where f str is (M, M )-structured, f psd is 1/F (M )- pseudorandom, and f err has norm at most .

    Here and in the sequel, we use subscripts in the O()asymptotic notation to denote that the implied constant de-pends on the subscripts. For instance, OF, (1) denotes aquantity bounded by C F, , for some quantity C F, depend-ing only on F and . Note that the pseudorandomness of f psd can be of arbitrarily high quality compared to the com-plexity of f str , since we can choose F to be whatever weplease; the cost of doing so, of course, is that the upperbound on M becomes worse when F is more rapidly grow-ing.

    To prove Theorem 2.6, we rst need a variant of Corol-lary 2.5 which gives some orthogonality between f str andf psd , at the cost of worsening the complexity bound on f str .

    Lemma 2.7 (Orthogonalweak structure theorem) . Let H, S be as above. Let f H be such that f H 1 , and let 0 < 1. Then there exists a decomposition (2) such that f str is (1/ 2 , O (1)) -structured, f psd is -pseudorandom,f err is zero, and f str , f psd H = 0 .

  • 8/14/2019 Structure and Randomness in Combinatorics

    4/14

    Proof. We perform a slightly different iteration to that inCorollary 2.5, where we insert an additional orthogonalisa-tion step within the iteration to a subspace V :

    Step 0. Initialise V := {0}and f err := 0 .

    Step 1. Set f str to be the orthogonal projection of f to

    V , and f psd := f f str . Step 2. If f psd is -pseudorandom then STOP. Oth-

    erwise, by Lemma 2.4, we can nd v S andc [1/, 1/ ] such that | f psd , v H | andf psd cv 2H f psd 2H 2 .

    Step 3. Replace V by span( V {v}), and return toStep 1.

    Note that at each stage, f psd H is the minimum dis-tance from f to V . Because of this, we see that f psd 2H decreases by at least 2 with each iteration, and so this al-gorithm terminates in at most 1/ 2 steps.

    Suppose the algorithm terminates in M steps for someM 1/ 2 . Then we have constructed a nested ag

    {0}= V 0 V 1 . . . V M of subspaces, where each V i is formed from V i 1 by adjoin-ing a vector vi in S . Furthermore, by construction we have

    | f i , vi | for some vector f i of norm at most 1 which isorthogonal to V i 1 . Because of this, we see that vi makesan angle of (1) with V i 1 . As a consequence of this andthe Gram-Schmidt orthogonalisation process, we see thatv1 , . . . , v i is a well-conditioned basis of V i , in the sense thatany vector w W i can be expressedas a linearcombination

    of v1 , . . . , v i with coefcients of size O,i ( w H ). In par-ticular, since f str has norm at most 1 (by Pythagoras theo-rem) and lies in V M , we see that f str is a linear combinationof v1 , . . . , vM with coefcients of size OM, (1) = O (1) ,and the claim follows.

    We can now iterate the above lemma and use a pigeon-holing argument to obtain the strong structure theorem.

    Proof of Theorem 2.6. We rst observe that it sufces toprove a weakened version of Theorem 2.6 in which f str is(OM, (1) , OM, (1)) -structured rather than (M, M ) struc-tured. This is because one can then recover the originalversion of Theorem 2.6 by making F more rapidly grow-ing, and redening M ; we leave the details to the reader.Also, by increasing F if necessary we may assume that F is integer-valued and F (M ) > M for all M .

    We now recursively dene M 0 := 1 and M i :=F (M i 1) for all i 1. We then recursively denef 0 , f 1 , . . . by setting f 0 := f , and then for each i 1using Lemma 2.7 to decompose f i 1 = f str ,i + f i wheref str ,i is (OM i (1) , OM i (1)) -structured, and f i is 1/M i -pseudorandom and orthogonal to f str ,i . From Pythagoras

    theorem we see that the quantity f i 2H is decreasing, andvaries between 0 and 1. By the pigeonhole principle, we canthus nd 1 i 1/ 2+1 such that f i 1 2H f i 2H 2 ;by Pythagoras theorem, this implies that f str ,i H .If we then set f str := f str ,0 + . . . + f str ,i 1 , f psd := f i ,f err := f str ,i , and M := M i 1 , we obtain the claim.

    Remark 2.8 . By tweaking the above argument a little bit,one can also ensure that the quantities f str , f psd , f err in The-orem 2.6 are orthogonal to each other. We leave the detailsto the reader.

    Remark 2.9 . The bound OF, (1) on M in Theorem 2.6 isquite poor in practice; roughly speaking, it is obtained byiterating F about O(1/ 2) times. Thus for instance if F isof exponential growth (which is typical in applications), M can be tower-exponential size in . These excessively largevalues of M unfortunately seem to be necessary in manycases, see e.g. [8] for a discussion in the case of the Sze-mer edi regularity lemma, which can be deduced as a conse-quence of Theorem 2.6.

    To illustrate how the strong regularity lemma works inpractice, we use it to deduce thearithmetic regularity lemmaof Green [ 13] (applied in the model case of the Hammingcube F n2 ). Let A be a subset of F n2 , and let 1A : F n2 {0, 1}be the indicator function. If V is an afne subspace(over F 2 ) of F n2 , we say that A is -regular in V for some0 < < 1 if we have

    |E x V (1A (x) V )e (x)| for all characters e , where E x V f (x) := 1| V | x V f (x)

    denotes the average value of f on V , and V :=E x V 1A (x) = |A V |/ |V | denotes the density of A inV . The following result is analogous to the celebrated Sze-mer edi regularity lemma:

    Lemma 2.10 (Arithmetic regularity lemma) . [13] Let AF n2 and 0 < 1. Then there exists a subspace V of codimension d = O (1) such that A is -regular on all but 2d of the translates of V .

    Proof. It will sufce to establish the claim with the weakerclaim that A is O(1/ 4 )-regular on all but O(2d) of thetranslates of V , since one can simply shrink to obtain theoriginal version of Lemma 2.10.

    We apply Theorem 2.6 to the setting in Example 2.1,with f := 1 A , and F to be chosen later. This gives us aninteger M = OF, (1) and a decomposition

    1A = f str + f psd + f err (3)

    where f str is (M, M )-structured, f psd is 1/F (M )-pseudorandom, and f err H . The function f str is acombination of at most M characters, and thus there exists

  • 8/14/2019 Structure and Randomness in Combinatorics

    5/14

    a subspace V F n2 of codimension d M such that f stris constant on all translates of V .We have

    E x F n2 |f err (x)|2 = 2d|V |/ |F n2 |.

    Dividing Fn2 into 2

    dtranslates y+ V of V , we thus concludethat we must have

    E x y+ V |f err (x)|2 (4)on all but at most 2d of the translates y + V .

    Let y + V be such that ( 4) holds, and let y+ V be theaverage of A on y + V . The function f str equals a constantvalue on y + V , call it cy+ V . Averaging ( 3) on y + V weobtain

    y+ V = cy+ V + E x y+ V f psd (x) + E x y+ V f err (x).

    Since f psd (x) is 1/F (M )-pseudorandom, some simpleFourier analysis (expressing 1y+ V as an average of char-acters) shows that

    |E x y+ V f psd (x)| 2n

    |V |F (M ) 2M

    F (M )

    while from (4) and Cauchy-Schwarz we have

    |E x y+ V f err (x)| 1/ 4and thus

    y+ V = cy+ V + O 2M

    F (M ) + O(1/ 4 ).

    By ( 3) we therefore have

    1A (x)y+ V = f psd (x)+ f err (x)+ O2M

    F (M )+ O(1/ 4).

    Now let e be an arbitrary character. By arguing as beforewe have

    |E x y+ V f psd (x)e (x)| 2M

    F (M )

    and

    |E x y+ V f err (x)e (x)| 1/ 4and thus

    E x y+ V (1A (x) y+ V )e (x) = O2M

    F (M )+ O(1/ 4 ).

    If we now set F (M ) := 1/ 42M we obtain the claim.

    For some applications of this lemma, see [13 ]. A de-composition in a similar spirit can also be found in [ 5], [15] .The weak structure theorem for Reed-Muller codes was alsoemployed in [ 18], [14] (under the name of a Koopman-von Neumann type theorem ).

    Now we obtain the Szemer edi regularity lemma itself.

    Recall that if G = ( V, E ) is a graph and A, B are non-empty disjoint subsets of V , we say that the pair (A, B ) is-regular if for any A A, B B with |A | |A| and|B | |B |, the numberof edges between A and B differsfrom A,B |A ||B |by at most |A ||B |, where A,B = |E (A B )|/ |A||B | is the edge density between A and B .Lemma 2.11 (Szemeredi regularity lemma) . [24 ] Let 0 < < 1 and m 1. Then if G = ( V, E ) is a graph with|V | = n sufciently large depending on and m , then thereexists a partition V = V 0 V 1 . . . V m with m m O,m (1) such that |V 0| n , |V 1| = . . . = |V m | , and such that all but at most (m )2 of the pairs (V i , V j ) for 1 i < j m

    are -regular.

    Proof. It will sufce to establish the weaker claim that

    |V 0| = O(n ), and all but at most O((m )2) of the pairs(V i , V j ) are O(1/ 12 )-regular. We can also assume withoutloss of generality that is small.

    We apply Theorem 2.6 to the setting in Example 2.3 withf := 1 E and F to be chosen later. This gives us an integerM = OF, (1) and a decomposition

    1E = f str + f psd + f err (5)

    where f str is (M, M )-structured, f psd is 1/F (M )-pseudorandom, and

    f err H . The function

    f stris a

    combination of at most M tensorproducts of indicatorfunc-tions 1A i B i . The sets Ai and B i partition V into at most22M sets, which we shall refer to as atoms . If |V | is suf-ciently large depending on M , m and , we can then parti-tion V = V 0 . . . V m with m m (m + 2 2M )/ ,|V 0| = O(n ), |V 1| = . . . = |V m |, and such that each V i for1 i m is entirely contained within an atom. In particu-lar f str is constant on V i V j for all 1 i < j m . Since is small, we also have |V i | = ( n/m ) for 1 i m .

    We have

    E (v,w ) V V |f err (v, w)|2 and in particular

    E 1 i

  • 8/14/2019 Structure and Randomness in Combinatorics

    6/14

  • 8/14/2019 Structure and Randomness in Combinatorics

    7/14

    Example 3.3 (Colourings) . Let X be a nite set, which wegive the uniform distribution as in Example 3.1. Supposewe colour this set using some nite palette Y by introduc-ing a map : X Y . If we endow Y with the discrete-algebra Y = 2 Y , then (Y, Y , ) is a factor of (X, X , ).The -algebra BY is then generated by the colour classes

    1(y) of the colouring . The expectation E (f |Y ) of a function f : X R is then given by the formulaE (f |Y )(x) := E x 1 ( (x )) f (x ) for all x X , where 1((x)) is the colour class that x lies in.

    In the previous section, the concept of structure was rep-resented by a set S of vectors. In this section, we shallinstead represent structure by a collection S of factors . Wesay that a factor Y has complexity at most M if it is the join Y = Y 1 . . . Y m of m factors from S for some0 m M . We also say that a function f L2(X )is -pseudorandom if we have E (f |Y ) L 2 (X ) for allY S . We have an analogue of Lemma 2.4:Lemma 3.4 (Lack of pseudorandomness implies energy in-crement) . Let (X, X , ) and S be as above. Let f L2(X )be such that f E (f |Y ) is not -pseudorandom for some0 < 1 and some factor Y . Then there exists Y S such that E (f |Y Y ) 2L 2 (X ) E (f |Y ) 2L 2 (X ) + 2 .Proof. By hypothesis we have

    E (f E (f |Y )|Y ) 2L 2 (X ) 2for some Y S . By Pythagoras theorem, this implies that

    E (f E (f |Y )|Y Y ) 2L 2 (X ) 2 .By Pythagoras theorem again, the left-hand side is

    E (f |Y Y ) 2L 2 (X ) E (f |Y ) 2L 2 (X ) , and the claim fol-lows.We then obtain an analogue of Lemma 2.7:

    Lemma 3.5 (Weak structure theorem) . Let (X, X , ) and

    S be as above. Let f L2(X ) be such that f L 2 (X ) 1 ,let Y be a factor, and let 0 < 1. Then there exists adecomposition f = f str + f psd , where f str = E (f |Y Y ) for some factor Y of complexity at most 1/ 2 , and f psd is-pseudorandom.

    Proof. We construct factors Y 1 , Y 2 , . . . , Y m S by thefollowing algorithm: Step 0: Initialise m = 0 . Step 1: Write Y := Y 1 . . . Y m , f str := E (f |YY ), and f psd := f f str . Step 2: If f psd is -pseudorandom then STOP . Oth-erwise, by Lemma 3.4 we can nd Y m +1 S suchthat E (f |Y Y Y m +1 ) 2L 2 (X ) E (f |Y

    Y ) 2L 2 (X ) + 2 .

    Step 3: Increment m to m + 1 and return to Step 1.Since the energy f str 2L 2 (X ) ranges between 0 and 1 (bythe hypothesis f L 2 (X ) 1) and increments by 2 at eachstage, we see that this algorithm terminates in at most 1/ 2steps. The claim follows.

    Iterating this we obtain an analogue of Theorem 2.6:

    Theorem 3.6 (Strong structure theorem) . Let (X, X , )and S be as above. Let f L2(X ) be such that f L 2 (X ) 1 , let > 0 , and let F : Z + R + bean arbitrary function. Then we can nd an integer M =OF, (1) and a decomposition (2) where f str = E (f |Y ) for some factor Y of complexity at most M , f psd is 1/F (M )- pseudorandom, and f err has norm at most .

    Proof. Without loss of generality we may assume F (M ) 2M . Also, it will sufce to allow Y to have complexityO(M ) rather than M .

    We recursively dene M 0 := 1 and M i := F (M i 1)2for all i 1. We then recursively dene factorsY 0 , Y 1 , Y 2 , . . . by setting Y 0 to be the trivial factor, andthen for each i 1 using Lemma 2.7 to nd a factorY i of complexity at most M i such that f E (f |Y i 1Y i ) is 1/F (M i 1)-pseudorandom, and then setting Y i :=Y i 1 Y

    i . By Pythagoras theorem and the hypothesis

    f L 2 (X ) 1, the energy E (f |Y i ) 2L 2 (X ) is increas-ing in i , and is bounded between 0 and 1. By the pi-geonhole principle, we can thus nd 1 i 1/ 2 + 1such that E (f |Y i ) 2L 2 (X ) E (f |Y i 1) 2L 2 (X ) 2 ;by Pythagoras theorem, this implies that E (f |Y i ) E (f |Y i 1) L 2 (X ) . If we then set f str := E (f |Y i 1 ),f psd := f E (f |Y i ), f err := E (f |Y i ) E (f |Y i 1), andM := M i 1 , we obtain the claim.

    This theorem can be used to give alternate proofs of Lemma 2.10 and Lemma 2.11; we leave this as an exer-cise to the reader (but see [ 25] for a proof of Lemma 2.11essentially relying on Theorem 3.6).

    As mentioned earlier, the key advantage of these typesof structure theorems is that the structured component f stris now obtained as a conditional expectation of the originalfunction f rather than merely an orthogonal projection, andso one has good L1 and L control on f str rather than just L2 control. In particular, these structure theorems are

    good for controlling sparsely supported functions f (suchas the normalised indicator function of a sparse set), by ob-taining a densely supported function f str which models thebehaviour of f in some key respects. Let us give a sim-plied sparse structure theorem which is too restrictivefor real applications, but which serves to illustrate the mainconcept.

    Theorem 3.7 (Sparse structure theorem, toy version) . Let 0 < < 1 , let F : Z + R + be a function, and let N

  • 8/14/2019 Structure and Randomness in Combinatorics

    8/14

    be an integer parameter. Let (X, X , ) and S be as above,and depending on N . Let L1(X ) be a non-negative function (also depending on N ) with the property that for every M 0 , we have the pseudorandomness property

    E ( |Y ) L (X ) 1 + oM (1) (7) for all factors Y of complexity at most M , where oM (1)is a quantity which goes to zero as N goes to innity for any xed M . Let f : X R (which also depends on N )obey the pointwise estimate 0 f (x) (x) for all xX . Then, if N is sufciently large depending on F and ,we can nd an integer M = OF, (1) and a decomposition(2) where f str = E (f |Y ) for some factor Y of complexityat most M , f psd is 1/F (M )-pseudorandom, and f err hasnorm at most . Furthermore, we have

    0 f str (x) 1 + oF, (1) (8)and

    X f str d = X f d. (9)An example to keep in mind is where X = {1, . . . , N }with the uniform probability measure , S consists of the-algebras generated by a single discrete interval {n Z :a n b}for 1 a b N , and being the function (x) = log N 1A (x), where A is a randomly chosen subset

    of {1, . . . , N }with (x A) = 1log N for all 1 x N ;one can then verify ( 7) with high probability using toolssuch as Chernoffs inequality. Observe that is bounded inL1(X ) uniformly in N , but is unbounded in L2(X ). Veryroughly speaking, the above theorem states that any dense

    subset B of A can be effectively modelled in some senseby a dense subset of {1, . . . , N }, normalised by a factor of 1log N ; this can be seen by applying the above theorem to thefunction f := log N 1B (x).

    Proof. We run the proof of Lemma 3.5 and Theorem3.6 again. Observe that we no longer have the bound

    f L 2 (X ) 1. However, from ( 7) and the pointwise bound0 f we know thatE (f |Y ) L 2 (X ) E ( |Y ) L 2 (X )

    E ( |Y ) L (X )

    1 + o

    M (1)

    for all Y of complexity at most M . In particular, for N large enough depending on M we have

    E (f |Y ) 2L 2 (X ) 2 (10)(say). This allows us to obtain an analogue of Lemma 3.5as before (with slightly worse constants), assuming that N is sufciently large depending on , by repeating the proof

    more or less verbatim. One can then repeat the proof of Theorem 3.6, again using (10 ), to obtain the desired decom-position. The claim (8) follows immediately from (7), and(9) follows since X E (f |Y ) d = X f d for any factorY . Remark 3.8 . In applications, one does not quite have theproperty ( 7); instead, one can bound E ( |Y ) by 1 + oM (1)outside of a small exceptional set, which has measure o(1)with respect to and . In such cases it is still possibleto obtain a structure theorem similar to Theorem 3.7; see[16, Theorem 8.1], [ 26, Theorem 3.9], or [ 34, Theorem 4.7].These structure theorems have played an indispensable rolein establishing the existence of patterns (such as arithmeticprogressions) inside sparse sets such as the prime numbers,by viewing them as dense subsets of sparse pseudorandomsets (such as the almost prime numbers), and then appeal-ing to a sparse structure theorem to model the original setby a much denserset, to which one can apply deep theorems

    (such as Szemer edis theorem [ 24]) to detect the desired pat-tern.

    The reader may observe one slight difference betweenthe concept of pseudorandomness discussed here, and theconcept in the previous section. Here, a function f psdis considered pseudorandom if its conditional expectationsE (f psd |Y ) are small for various structured Y . In the pre-vious section, a function f psd is considered pseudorandomif its correlations f psd , g H were small for various struc-tured g. However, it is possible to relate the two notionsof pseudorandomness by the simple device of using a struc-tured function g to generate a structured factor Y g . In mea-sure theory, this is usually done by taking the level setsg 1([a, b]) of g and seeing what -algebra they generate.In many quantitative applications, though, it is too expen-sive to take all of these the level sets, and so instead oneonly takes a nite number of these level sets to create therelevant factor. The following lemma illustrates this con-struction:

    Lemma 3.9 (Correlation with a function implies non-trivialprojection) . Let (X, X , ) be a probability space. Let f L1(X ) and g L2(X ) be such that f L 1 (X ) 1 and g L 2 (X ) 1. Let > 0 and 0 < 1 , and let Y be the factor Y = ( R , Y , g) , where Y is the -algebra generated by the intervals [(n + ), (n + 1 + )) for n Z . Thenwe have

    E (f |Y ) L 2 (X ) |f, g L 2 (X ) | .Proof. Observe that the atoms of BY are generated by levelsets g 1([(n + ), (n + 1 + ))) , and on these level setsg uctuates by at most . Thus

    g E (g|Y ) L (X ) .

  • 8/14/2019 Structure and Randomness in Combinatorics

    9/14

    Since f L 1 (X ) 1, we concludef, g L 2 (X ) f, E (g|Y ) L 2 (X ) .

    On the other hand, by Cauchy-Schwarz and the hypothesisg L 2 (X ) 1 we have

    | f, E (g|Y ) L 2 (X ) | = | E (f |Y ), g L 2 (X ) | E (f |Y ) L 2 (X ) .

    The claim follows.

    This type of lemma is relied upon in the above-mentioned papers [ 16], [26 ], [34] to convertpseudorandom-ness in the conditional expectation sense to pseudorandom-ness in the correlation sense. In applications it is also conve-nient to randomise the shift parameter in order to averageaway all boundary effects; see e.g. [ 32, Lemma 3.6].

    4. Structure and randomness via uniformitynorms

    In the preceding sections, we specied the notion of structure (either via a set S of vectors, or a collection S of factors), which then created a dual notion of pseudoran-domness for which one had a structure theorem. Such de-compositions give excellent control on the structured com-ponent f str of the function, but the control on the pseudo-random part f psd can be rather weak. There is an opposingapproach, in which one rst species the notion of pseudo-randomnessone would like to have for f psd , and then worksas hard as one can to obtain a useful corresponding notionof structure. In this approach, the pseudorandom compo-nent f psd is easy to dispose of, but then all the difcultygets shifted to getting an adequate control on the structuredcomponent.

    A particularly useful family of notions of pseudo-randomness arises from the Gowers uniformity norms

    f U d (G ) . These norms can be dened on any nite ad-ditive group G, and for complex-valued functions f : G C , but for simplicity let us restrict attention to a Hammingcube G = F n2 and to real-valued functions f : F n2 R .(For more general groups and complex-valued functions,see [33] . For applications to graphs and hypergraphs, onecan use the closely related Gowers box norms ; see [11],[12], [20], [26], [30] , [33].) In that case, the uniformitynorm f U d ( F n2 ) can be dened for d 1 by the formula

    f 2d

    U d ( F n2 ) := E L :F d2 F n2a F d2

    f (L(a))

    where L ranges over all afne-linear maps from F d2 to F n2

    (not necessarily injective). For instance, we have

    f U 1 ( F n2 ) = |E x,h F n2 f (x)f (x + h)|1/ 2= |E x F n2 f (x)|

    f U 2 ( F n2 ) = |E x,h,k F n2 f (x)f (x + h)f (x + k)

    f (x + h + k)|1/ 4

    = |E h F n2 |E x F n2 f (x)f (x + h)|2|1/ 4f U 3 ( F n2 ) = |E x,h 1 ,h 2 ,h 3 F n2 f (x)f (x + h1 )f (x + h2)

    f (x + h3)f (x + h1 + h2)f (x + h1 + h3)f (x + h2 + h3)f (x + h1 + h2 + h3)|1/ 8 .

    It is possible to show that the norms U d ( F n2 ) are indeed anorm for d 2, and a semi-norm for d = 1 ; see e.g. [ 33] .These norms are also monotone in d:0 f U 1 ( F n2 ) f U 2 ( F n2 ) f U 3 ( F n2 ) . . . f L ( F n2 ) .(11)The d = 2 norm is related to the Fourier coefcients f ()dened in ( 1) by the important (and easily veried) identity

    f U 2 ( F n2 ) = ( F n2

    |f ()|4 )1/ 4 . (12)

    More generally, the uniformity norms f U d ( F n2 ) for d 1are related to Reed-Muller codes of order d 1 (althoughthis is partly conjectural for d 4), but the relationshipcannotbe encapsulated in an identity as elegant as (12 ) onced 3. We will return to this point shortly.

    Let us informally call a function f : F n2 R pseu-dorandom of order d 1 if f U d ( F n2 ) is small; thus forinstance functions with small U 2 norm are linearly pseu-dorandom (or Fourier-pseudorandom , functions with smallU 3 norm are quadratically pseudorandom , and so forth. Itturns out that functions which are pseudorandom to a suit-able order become negligible for the purpose of variousmultilinear correlations (and the higher the order of pseudo-randomness, the more complex the multilinear correlationsthat become negligible). This can be demonstrated by re-peated application of the Cauchy-Schwarz inequality. Wegive a simple instance of this:

    Lemma 4.1 (Generalised von Neumann theorem) . Let T 1 , T 2 : F 2n F n2 be invertible linear transformations suchthat T 1 T 2 is also invertible. Then for any f ,g ,h : F

    2n [1, 1] we have

    |E x,r F n2 f (x)g(x + T 1r )h(x + T 2r )| f U 2 ( F n2 ) .Proof. By changing variables r := T 2r if necessary wemay assume that T 2 is the identity map I . We rewrite theleft-hand side as

    |E x F n2 h(x)E r F n2 f (x r )g(x + ( T 1 I )r )|

  • 8/14/2019 Structure and Randomness in Combinatorics

    10/14

    and then use Cauchy-Schwarz to bound this from above by

    (E x F n2 |E r F n2 f (x r )g(x + ( T 1 I )r )|2 )1/ 2which one can rewrite as

    |E x,r,r F n2 f (x

    r )f (x

    r )g(x+( T 1

    I )r )g(x+( T 1

    I )r )

    |1/ 2 ;

    applying the change of variables (y,s ,h ) := ( x + ( T 1 I )r, T 1r, r r ), this can be rewritten as

    |E y,h F n2 g(y)g(y+( T 1I )h)E s F n2 f (y+ s)f (y+ s+ h)|1/ 2;applying Cauchy-Schwarz, again, one can bound this by

    E y,h F n2 |E s F n2 f (y + s)f (y + s + h)|21/ 4

    .

    But this is equal to f U 2 ( F n2 ) , and the claim follows.

    For a more systematic study of such generalised vonNeumann theorems, including some weighted versions,see Appendices B and C of [ 19] .

    In view of these generalised von Neumann theorems, itis of interest to locate conditions which would force a Gow-ers uniformity norm f U d ( F n2 ) to be small. We rst givea soft characterisation of this smallness, which at rstglance seems too trivial to be of any use, but is in fact pow-erful enough to establish Szemer edis theorem (see [ 28]) aswell as the Green-Tao theorem [ 16] . It relies on the obviousidentity

    f 2d

    U d ( F n2 ) = f, Df L 2 ( F n2 )where the dual function Df of f is dened as

    Df (x) := E L :F d2 F n2 ;L (0)= xa F d2 \{ 0}

    f (L(a)) . (13)

    As a consequence, we have

    Lemma 4.2 (Dual characterisation of pseudorandomness) . Let S denote the set of all dual functions DF withF L ( F n2 ) 1. Then if f : F n2 [1, 1] is suchthat f U d ( F n2 ) for some 0 < 1 , then we have

    f, g 2d

    for some g S .

    In the converse direction, one can use the Cauchy-Schwarz-Gowers inequality (see e.g. [ 10], [16] , [19],[33]) to show that if f, g for some g S , thenf U d ( F n2 ) .

    The above lemma gives a soft way to detect pseudo-randomness, but is somewhat unsatisfying due to the rathernon-explicit description of the structured set S . To inves-tigate pseudorandomness further, observe that we have therecursive identity

    f 2d

    U d ( F n2 ) = E h F n2 f f h2d 1U d 1 ( F n2 ) (14)

    (which, incidentally, can be used to quickly deduce themonotonicity ( 11) ). From this identity and induction wequickly deduce the modulation symmetry

    fg U d ( F n2 ) = f U d ( F n2 ) (15)

    whenever g S d 1

    (F n2

    ) is a Reed-Muller code of orderat most d 1. In particular, we see that g U d ( F n2 ) = 1for such codes; thus a code of order d 1 or less is de-nitely not pseudorandom of order d. A bit more generally,by combining (15) with ( 11) we see that

    | f, g L 2 ( F n2 ) | = f g U 1 ( F n2 ) fg U d ( F n2 ) = f U d ( F n2 ) .In particular, any function which has a large correlation witha Reed-Muller code g S d 1(F n2 ) is not pseudorandom of order d. It is conjectured that the converse is also true:

    Conjecture 4.3 (Gowers inverse conjecture for F n2 ). If d 1 and > 0 then there exists > 0 with the following property: given any n 1 and any f : F n2 [1, 1]with f U d ( F n2 ) , there exists a Reed-Muller code gS d 1(F n2 ) of order at most d 1 such that | f, g L 2 ( F n2 ) | .

    This conjecture, if true, would allow one to apply the ma-chinery of previoussections and then decompose a boundedfunction f : F n2 [1, 1] (or a function dominated bya suitably pseudorandom function ) into a function f strwhich was built out of a controlled number of Reed-Mullercodes of order at most d 1, a function f psd which waspseudorandom of order d, and a small error. See for in-stance [ 14] for further discussion.

    The Gowers inverse conjecture is trivial to verify for d =1. For d = 2 the claim follows quickly from the identity(12) and the Plancherel identity

    f 2L 2 ( F n2 ) = F n2

    |f ()|2 .

    The conjecture for d = 3 was rst established by Samorod-nitsky [ 23] , using ideas from [ 9] (see also [17 ], [33] forrelated results). The conjecture for d > 3 remains open; akey difculty here is that there are a huge number of Reed-Muller codes (about 2( n

    d 1 ) or so, compared to the di-mension 2n of L2(F n2 )) and so we denitely do not havethe type of orthogonality that one enjoys in the Fourier cased = 2 . For related reasons, we do not expect any identityof the form (12 ) for d > 3 which would allow the very fewReed-Muller codes which correlate with f to dominate theenormous number of Reed-Muller codes which do not inthe right-hand side.

    However, we can present some evidence for it here in the99%-structured case when is very close to 1. Let us rsthandle the case when = 1 :

  • 8/14/2019 Structure and Randomness in Combinatorics

    11/14

    Proposition 4.4 (100%-structured inverse theorem) . Sup- pose d 1 and f : F n2 [1, 1] is such that f U d ( F n2 ) =1. Then f is a Reed-Muller code of order at most d 1.Proof. We induct on d. The case d = 1 is obvious. Nowsuppose that d 2 and that the claim has already beenproven for d

    1. If f

    U d

    (F n

    2 )= 1 , then from ( 14) we

    haveE h F n2 f f h

    2d 1U d 1 ( F n2 ) = 1 .

    On the other hand, from ( 11) we have f f h U d 1 ( F n2 ) 1for all h . This forces f f h U d 1 ( F n2 ) = 1 for all h. Byinduction hypothesis, f f h must therefore be a Reed-Mullercode of order at most d 2 for all h . Thus for every h thereexists a polynomial P h : F n2 F 2 of degree at most d 2such that

    f (x + h) = f (x)(1)P h (x )for all x, h F n2 . From this one can quickly establish byinduction that for every 0 m n , the function f is aReed-Muller code of degree at most d 1 on F

    m2 (viewedas a subspace of F n2 ), and the claim follows.

    To handle the case when is very close to 1 is trickier(we can no longer afford an induction on dimension, as wasdone in the above proof). We rst need a rigidity result.

    Proposition 4.5 (Rigidity of Reed-Muller codes) . For everyd 1 there exists > 0 with the following property: if n 1 and f S d 1(F n2 ) is a Reed-Muller code of order at most d 1 such that E x F n2 f (x) 1 , then f 1.Proof. We again induct on d. The case d = 1 is obvious, sosuppose d

    2 and that the claim has already been proven

    for d1. If E x F n2 f (x) 1, then E x F n2 |1f (x)| .Using the crude bound |1 f f h | = O(|1 f |+ |1 f h |)we conclude that E x F n2 |1 f f h (x)| O(), and thusE x F n2 f f h (x) 1 O()

    for every h F n2 . But f f h is a Reed-Muller code of orderd 2, thus by induction hypothesis we have f f h 1 forall h if is small enough. This forces f to be constant; butsince f takes values in {1, +1 }and has average at least1 , we have f 1 as desired for small enough.Proposition 4.6 (99%-structured inverse theorem) . [2] For every d

    1 and 0 < < 1 there exists 0 < < 1 with the

    following property: if n 1 and f : F n2 [1, 1] is suchthat f U d ( F n2 ) 1 , then there exists a Reed-Muller code g S d 1(F n2 ) such that f, g L 2 ( F n2 ) 1 .Proof. We again induct on d. The case d = 1 is obvious, sosuppose d 2 and that the claim has already been provenfor d1. Fix , let be a small number (depending on d and) to be chosen later, and suppose f : F n2 [1, 1] is suchthat f U d ( F n2 ) 1 . We will use o(1) to denote any

    quantity which goes to zero as 0, thus f U d ( F n2 ) 1 o(1) . We shall say that a statement is true for most x F n2 if it is true for a proportion 1 o(1) of valuesx F n2 .Applying ( 14) we have

    Eh

    F n2 f f h U

    d( F

    n2 ) 1 o(1)

    while from ( 11) we have f f h U d ( F n2 ) 1. Thus we havef f h U d ( F n2 ) = 1 o(1) for all h in a subset H of F n2 of density 1 o(1) . Applying the inductive hypothesis, weconclude that for all h H there exists a polynomial P h :F n2 F 2 of degree at most d 2 such that

    E x F n2 f (x)f (x + h)(1)P h (x ) 1 o(1) .Since f is bounded in magnitude by 1, this implies for eachh H that

    f (x + h) = f (x)(

    1)P h (x ) + o(1) (16)

    for most x. For similar reasons it also implies that |f (x)| =1 + o(1) for most x.Now suppose that h1 , h2 , h3 , h4 H form an additive

    quadruple in the sense that h1 + h2 = h3 + h4 . Then from(16) we see that

    f (x + h1 + h2) = f (x)(1)P h 1 (x )+ P h 2 (x + h 1 ) + o(1) (17)for most x, and similarly

    f (x + h3 + h4) = f (x)(1)P h 3 (x )+ P h 4 (x + h 3 ) + o(1)for most x. Since

    |f (x)

    |= 1+ o(1) for most x, we conclude

    that

    (1)P h 1 (x )+ P h 2 (x + h 1 ) P h 3 (x ) P h 4 (x + h 3 ) = 1 + o(1)for most x. In particular, the average of the left-hand side inx is 1 o(1) . Applying Lemma 4.5 (and assuming smallenough), we conclude that the left-hand side is identically1, thus

    P h 1 (x) + P h 2 (x + h1) = P h 3 (x) + P h 4 (x + h3) (18)

    for all additive quadruples h1 + h2 = h3 + h4 in H and allx.

    Now for any k F n2

    , dene the quantity Q(k) F2

    bythe formula

    Q(k) := P h 1 (0) + P h 2 (h1) (19)

    whenever h1 , h 2 H are such that h1 + h2 H . Note thatthe existence of such an h1 , h2 is guaranteed since most hlie in H , and (18) ensures that the right-hand side of (19 )does not depend on the exact choice of h1 , h2 and so Q iswell-dened.

  • 8/14/2019 Structure and Randomness in Combinatorics

    12/14

    Now let x F n2 and h H . Then, since most elementsof F n2 lie in H , we can nd r 1 , r 2 , s1 , s2 H such thatr 1 + r2 = x and s1 + s2 = x + h. From (17 ) we see that

    f (y+ x) = f (y+ r 1+ r 2) = f (y)(1)P r 1 (y)+ P r 2 (y+ r 1 ) + o(1)and

    f (y+ x+ h) = f (y+ s1+ s2) = f (y)(1)P s 1 (y )+ P s 2 (y+ s 1 ) + o(1)for most y. Also from ( 16)

    f (y + x + h) = f (y + x)(1)P h (y+ x ) + o(1)for most y. Combining these (and the fact that |f (y)| =1 + o(1) for most y) we see that(1)P s 1 (y )+ P s 2 (y+ s 1 ) P r 1 (y ) P r 2 (y+ r 1 ) P h (y+ x ) = 1+ o(1)for most y. Taking expectations and applying Lemma 4.5

    as before, we conclude that

    P s 1 (y)+ P s 2 (y+ s1)P r 1 (y)P r 2 (y+ r 1)P h (y+ x) = 0for all y. Specialising to y = 0 and applying ( 19) we con-clude that

    P h (x) = Q(x + h) Q(x) = Qh (x) Q(x) (20)for all x F n2 and h H ; thus we have succesfully in-tegrated P h (x). We can then extend P h (x) to all h F n2(not just h H ) by viewing ( 20) as a denition . Observethat if h F n2 , then h = h1 + h2 for some h1 , h 2 H , andfrom ( 20) we have

    P h (x) = P h 1 (x) + P h 2 (x + h1).

    In particular, since the right-hand side is a polynomial of degree at most d 2, the left-hand side is also. Thus wesee that Qh Q is a polynomial of degree at most d 2 forall h, which easily implies that Q itself is a polynomial of degree at most d1. If we then set g(x) := f (x)(1)Q (x ) ,then from (16 ), (20) we see that for every h H we have

    g(x + h) = g(x) + o(1)

    for most x. From Fubinis theorem, we thus conclude thatthere exists an x such that g(x+ h) = g(x)+ o(1) formost h ,thus g is almost constant. Since |g(x)| = 1 + o(1) for mostx, we thus conclude the existence of a sign {1, +1 }such that g(x) = + o(1) for most x. We conclude that

    f (x) = (1)Q (x ) + o(1)for most x, and the claim then follows (assuming is smallenough).

    Remark 4.7 . The above argument requires f U d ( F n2 ) to bevery close to 1 for two reasons. Firstly, one wishes to ex-ploit the rigidity property; and secondly, we implicitly usedat many occasions the fact that if two properties each hold1 o(1) of the time, then they jointly hold 1 o(1) of thetime as well. These two facts break down once we leavethe 99%-structured world and instead work in a 1%-structured world in which various statements are only truefor a proportion at least for some small . Nevertheless,the proof of the Gowers inverse conjecture for d = 2 in[23] has some features in common with the above argument,giving one hope that the full conjecture could be settled bysome extension of these methods.

    Remark 4.8 . The above result was essentially proven in [2](extending an argument in [ 4] for the linear case d = 2 ),using a majority vote version of the dual function ( 13) .

    5. Concluding remarks

    Despite the above results, we still do not have a system-atic theory of structure and randomness which covers allpossible applications (particularly for sparse objects). Forinstance, there seem to be analogous structure theorems forrandom variables, in which one uses Shannon entropy in-stead of L2-based energies in order to measure complexity;see [25]. In analogy with the ergodic theory literature (e.g.[7]), there may also be some advantage in pursuing relativestructure theorems, in which the notions of structure andrandomness are all relative to some existing known struc-ture, such as a reference factor Y 0 of a probability space(X, X , ). Finally, in the iterative algorithms used above toprove the structure theorems, the additional structures usedat each stage of the iteration were drawn from a xed stock of structures ( S in the Hilbert space case, S in the measurespace case). In some applications it may be more effectiveto adopt a more adaptive approach, in which the stock of structures one is using varies after each iteration. A simpleexample of this approach is in [ 32], in which the structuresused at each stage of the iteration are adapted to a certainspatial scale which decreases rapidly with the iteration. Iexpect to see several more permutations and renements of these sorts of structure theorems developed for future appli-

    cations.

    6. Acknowledgements

    The author is supported by a grant from the MacArthurFoundation, and by NSF grant CCF-0649473. The author isalso indebted to Ben Green for helpful comments and refer-ences.

  • 8/14/2019 Structure and Randomness in Combinatorics

    13/14

  • 8/14/2019 Structure and Randomness in Combinatorics

    14/14

    [33] T. Tao and V. Vu, Additive Combinatorics, CambridgeUniv. Press, 2006.

    [34] T. Tao, T. Ziegler, The primes contain arbitrarily long polynomial progressions , preprint.