by Venkatesh Medabalimi - University of Toronto T-Space · 2018-08-23 · Venkatesh Medabalimi Doctor of Philosophy Department of Computer Science University of Toronto 2018 A longstanding

BRANCHING PROGRAM LOWER BOUNDS

by

Venkatesh Medabalimi

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Department of Computer ScienceUniversity of Toronto

c© Copyright 2018 by Venkatesh Medabalimi

Abstract

Branching Program Lower Bounds

Venkatesh MedabalimiDoctor of Philosophy

Department of Computer ScienceUniversity of Toronto

2018

A longstanding open problem in complexity theory is whether the class Polytime (P) is the same as LogSpace (L) or

nondeterministic LogSpace (NL). In this thesis, we explore this problem by studying time/space tradeoffs for problems

in P. As in, for some natural problem in P, does adding a space restriction prevent a polynomial time solution ?

To begin, we prove exponential lower bounds on the size of a restricted model of branching programs called

semantic read-once 3-ary nondeterministic branching programs solving a polytime computable function, in particular

the polynomial evaluation problem. In the second part, we prove lower bounds against branching programs solving

Function Composition. In particular, we show that the amount of space required for computing the composed function

grows as is seen in the straight forward algorithm (where space grows additively). If shown true for general branching

programs, this would separate L and P. We show this is indeed true in the restricted setting of nondeterministic

read once branching programs, for the syntactic model as well as the stronger semantic model. We prove that such

branching programs for solving the tree evaluation problem over an alphabet of size k requires size roughly kΩ(h), i.e

space Ω(h log k), where h is the number of compositions.

Then we focus entirely on general branching programs sticking to the theme of lower bounds against function

composition. We give a better lower bound than is possible by using Neciporuk’s method for k-way branching pro-

grams solving a specific composition problem. Using essentially the same method we give a matching lower bound to

that achievable by using Neciporuk’s method for binary branching programs. Any marginal improvement here would

be consequential towards beating Neciporuk’s method for binary branching programs, a longstanding open problem.

We then proceed to give some surprising upper bounds based on communication complexity protocols that are dif-

ferent from naive upper bounds. Our aim here is to improve our understanding of a possible approach to prove the

suspected lower bound just mentioned, but the connections to communication complexity therein might themselves be

of independent interest.

ii

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Branching Programs and Other Computational models . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Lower bounds for Unrestricted Branching Programs, the Nechiporuk’s Method . . . . . . . . 3

1.3 Restricted Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Width Restricted Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2 Bounded Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 Time Space tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Function Composition and The Tree Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.1 The KRW conjecture: Understanding composition as a way of separating complexity classes . 10

1.4.2 The Tree Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.3 k-way Branching Programs Solving Tree Evaluation Problem . . . . . . . . . . . . . . . . . 12

1.4.4 NBP Upper-bounds via pebbling schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Lower Bound for Ternary Functions 172.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Nondeterministic Semantic Read-Once Branching Programs . . . . . . . . . . . . . . . . . . 20

2.2.2 Polynomial Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.3 Rectangles and Embedded Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Lower Bound for |D| = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Semantic Branching Programs with |D| = 2 can evade Large Bottleneck Rectangles . . . . . . . . . . 27

2.5.1 Candidate Problems for a Boolean function lower bound . . . . . . . . . . . . . . . . . . . . 28

3 Hardness of Function Composition for Semantic Read once Branching Programs 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 History and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Definitions and Statement of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

iii

3.2.1 Black/White pebbling, A natural upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Proof Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4 Most ~F have a lot of accepting instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5 Finding an Embedded Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5.1 Finding a rectangle over the leaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.2 Refining the Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6 The Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.8.1 Neciporuk via Function Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.8.2 The lower bound holds for most ~F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 General Branching Programs 514.1 Nechiporuk’s method and its limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3 Lower bounds via communication complexity improving on Nechiporuk for k-way BPs. . . . . . . . 53

4.3.1 Lower Bound on the Number of Leaf Reading States . . . . . . . . . . . . . . . . . . . . . . 544.3.2 A Communication Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Technique doesn’t give bounds that grow with h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.5 Lower Bound for Binary Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.5.1 A Conjecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.5.2 Composition at different parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.6 Surprising Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.6.1 Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.6.2 Generic Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Bibliography 61

iv

Chapter 1

Introduction

1.1 Motivation

One of the central interests of complexity theory is to understand what can or cannot be computed given a limitedamount of computational resource. The most widely studied resources in this context are computation time andcomputation space or memory. The class P is the class of polynomial time computable functions and what is thoughtof as synonymous to being efficiently computable. The space complexity class L on the other hand contains problemssolvable in logarithmic space. A major question in complexity theory is whether polynomial-time is the same aslog-space. Consider the sequence of complexity classes :

AC0(6) ⊆ NC1 ⊆ L ⊆ NL ⊆ LogCFL ⊆ AC1 ⊆ NC2 ⊆ P ⊆ NP ⊆ PH

As of today, we do not know if even one among the above sequence of containments is strict. In fact, it is open ifAC0(6)=PH! The problem of whether L is strictly contained in P is nestled in the above chain. In this work we shallbe interested in exploring this containment(and consequently those in between.) The common belief is that L ⊂ P.

Typically, algorithmic design problems focus on the goal of minimizing one of the resources, time or space. It isvery natural to study the relationship between the goals of minimizing the amount of space and time used. It is wellknown that these goals go hand in hand to some extent; if we have an upper bound of S(n) on the amount of spaceused by an algorithm on an input of size n, then that algorithm has at most 2S(n) distinct memory configurations andtherefore runs in time at most 2S(n). This observation shows that a very space-efficient algorithm is at least somewhattime efficient. Nevertheless, we often observe that allowing algorithms to use more memory allows for a decrease inthe amount of time required to solve the problem and we hold the belief that there are polytime computable functionswhich require super logarithmic amount of space.

It helps to introduce our favoured model of computation that we shall use throughout this work, Branching Pro-

grams. Branching programs are a non-uniform model of computation that give us a clean way of talking about bothtime and space at the same time. They were first introduced by Lee [46] as an alternative for circuits and later studiedby Masek [50] under the name of ‘decision graphs’.

1

1.2 Branching Programs

Definition 1. Deterministic Branching ProgramA k-way deterministic branching program(BP) is a directed acyclic graphG with a source node and k sinks. The sinksare labelled by elements in [k] and have out-degree 0. We also refer to nodes as states. Each non-sink state is labelledby some variable xi and has out degree k. Its outgoing edges are labelled by tests xi = b for some i ∈ 1, 2, 3, .., nand b ∈ [k] with one edge for each b. We say such a program computes an fn : [k]n → [k] in the following way. Givenan input ~ξ ∈ [k]n we start at the source node and traverse a unique path in the graph which is consistent with input ~ξ.This way we reach a sink whose label is taken to be the output of the function fn.

The size of a deterministic branching program is the number of nodes in it. When k = 2 we obtain deterministicbinary branching programs.

Definition 2. Non-deterministic Branching programA k-way non-deterministic branching program(NBP) is a directed acyclic graphGwith a source node qstart and a sinknode (the accept node) qaccept. We refer to nodes as states. Each non-sink state is labeled with some input variablexi, and each edge directed out of a state is labeled with some value b ∈ [k] for xi.We say such a program computesan fn : [k]n → 0, 1 in the following way. For each ~ξ ∈ [k]n, the branching program accepts ~ξ if and only if thereexists at least one directed path starting at the qstart and leading to the accepting state qaccept, and such that all labelsalong this path are consistent with ~ξ.

The size of a non-deterministic branching program is the number nodes in the graph. When k = 2 we obtainnon-deterministic binary branching programs. Unless we state otherwise by a branching program we simply meandeterministic binary branching program.

Branching programs have been used as a model to understand space and time complexity lower bounds and alsoto come up with space-time tradeoff lower bounds for computing specific functions. For a boolean function fn, letBP (fn) andNBP (fn) denote the minimal size of deterministic and non-deterministic branching program computingf respectively.

The length of a branching program is the number of edges in a longest path. It is clear that in deterministicbranching programs the length of the BP can be seen as a measure of computation time. In the case of non-determnisticbranching programs the maximum over inputs of the minimum over the length of all computation paths accepting thatinput

max~ξ∈f−1(1)

minπ∈Accepting Paths(~ξ)

length(π)

can be taken as a measure of time. What does the size of branching program tell us ? Branching program sizeBP (fn) and space complexity S(fn) of non-uniform Turing machine computing fn are tightly related. The followingtheorem summarizes this important motivation for studying branching programs: to analyze space complexity ofcomputing a function.

Theorem 3. [17] [54]For a boolean function fn : 0, 1n → 0, 1, S(fn) = O(log(max BP (fn), n)) andBP (fn) = 2O(max S(fn),logn)

Proof. Let Gn be a BP for fn of size BP (fn). We can simulate Gn by a non-uniform Turing machine with an

2

encoding of Gn on its read only oracle tape. We index the nodes of Gn using O(logGn) bits, the succeeding nodesusing O(logGn) bits and the label using O(log n) bits. At any time this information is sufficient to keep track of thelocation in the computation path and proceed to a new node. The encoding of Gn is of length O(BP (fn) logBP (f))

and so a pointer to the read only oracle tape costs O(log(BP (fn)). A pointer to the input variable corresponding tothe label of the node has a cost of O(log n) bits. So the space complexity S(fn) is O(log(max BP (fn), n)).The other direction is straight forward.

1.2.1 Branching Programs and Other Computational models

To place the branching program model and its complexity measures in perspective lets compare it with other compu-tational models like circuits and formulas defined over the basis ∨,∧,¬.Definition 4. CircuitLet x1, x2, .., xn be a set of variables. A circuit is a directed acyclic graph with two types of nodes: 1) nodes within-degree 0, called the input nodes, each labelled by either a variable xi or a negated variable xi and 2) nodes within-degree 2 called gates, each labelled by a Boolean binary operation-either ∧(AND) or ∨(OR). There is a single nodewith out-degree 0 called the output node.

The depth of a circuit C denoted d(C) is the length of the longest path from the output node to an input node. Thesize of a circuit is the number of nodes in it.

A circuit in which each node except for the output node has out-degree 1 is called a formula. The size of a formulaF , denoted L(F ) is the number of input nodes.

For a boolean function f the depth complexity d(f) is the minimum depth of a circuit computing f . The circuitcomplexity, C(f) is the minimum size of a circuit computing f . The formula complexity L(f) is the minimum sizeof a formula computing f .

For a boolean function f , the circuit complexity C(f), formula complexity L(f) and the branching programcomplexity BP (f) obey following relations [61],

• BP (f) ≥ 13C(f)

• BP (f) ≤ L(f)

As a consequence any lower bounds on branching program complexity immediately yield formula complexity lowerbounds, which in itself has been an interesting pursuit for researchers. It easy to show that for almost all booleanfunctions f , BP (f) ≥ 1

32n

n and BP (f) = O( 2n

n ) for all boolean functions. While we can prove the lower bound fora random function easily, an interesting question is what can we say for functions which are special in some sense, saypolytime computable. Our main endeavor in this work shall be to further our understanding of possible ways to showthat there are in fact polytime computable functions which cannot be computed by polysized branching programs.For the reader more familiar with these other computational models, arguably, this task is challenging given that forexample the best known formula complexity lower bound [24,43,57] for a polytime computable function is only cubicin the size of the input.

1.2.2 Lower bounds for Unrestricted Branching Programs, the Nechiporuk’s Method

Nechiporuk’s method [51] gives a lower bound for BP and NBP of a general function f . Fix a partition of thevariable set X into m disjoint sets Y1, Y2, .., Ym. For each Yi let ci(f) denote the number of possible sub-functions on

3

Yi obtained by fixing the variables outside Yi to all possible values. One can use this information to get a lower boundon BP (f) and NBP (f).

Theorem 5. Nechiporuk’s MethodThere exists a constant ε > 0 such that for every boolean function f that depends on all its inputs and for everypartition of its variable set X into m sets,

BP (f) ≥ εm∑i=1

log ci(f)

log log ci(f)

NBP (f) ≥ εm∑i=1

√log ci(f)

Proof. For a subset of variables Yi consider the branching program obtained by fixing the remaining variables inX \ Yi. If a node v is labelled with y ∈ X \ Yi in the original program, and is set to 1 connect all the incoming edgesto v to the destination nodes outgoing 1-edges and likewise if v is set to 0. Let the number of nodes left including thesink nodes be hi. The number of branching programs possible on these hi nodes is at most hi|Yi|hihi2hi since thereare hi choices for the start node, each of the nodes can be labelled in Yi ways and the two outgoing edges from eachof them has hi possible destinations.(In this estimate we do not bother if cycles appear.) This number should be atleast ci(f), the total number of sub-functions on the variables in Yi for all possible fixed values given to remainingvariables. If the function depends on all its variables we know hi ≥ |Yi|

hi|Yi|hihi2hi ≥ ci(f)

=⇒ h4hii ≥ ci(f)

=⇒ ∃ε > 0 s.t hi ≥ εlog ci(f)

log log ci(f

And so we have,

BP (f) ≥m∑i=1

hi ≥ εm∑i=1

log ci(f)

log log ci(f

Similarly for a non-deterministic branching computing f consider the program obtained by fixing the nodes la-belled by variables in X \Yi. Let the number of states left be Vi. The number of possible non-deterministic branchingprograms on Vi nodes(which are prelabelled) is then at most the number of ways these nodes can end up being con-nected by edges labelled 1 or 0. This can happen in 22V 2

i ways. Consequently,

22V 2i ≥ ci(f) =⇒ Vi ≥

1√2

√log ci(f)

The size of the branching program is∑mi=1 Vi and so the bound on NBP (f) follows.

Nechiporuk’s method remains the strongest known lower bound technique for dealing with general branching

4

programs computing some function in NP. The element distinctness function is a boolean function on n = 2m logm

variables divided into m consecutive blocks 2 logm variables in each of them. Each of these blocks encode a numberin [m2]. The function evaluates to 1 if and only all these numbers are distinct. Nechiporuk’s method [51] can be usedto show element distinctness requires deterministic branching programs of size Ω( n2

log2 n).

But the method has its inherent limitations as to how large a lower bound it can fetch.(More on this in chapter4.) In the next section we shall shift our attention to certain ways of restricting branching programs that have beenconsidered so as to primarily turn them amenable for analysis and then see if imposing such restrictions can fetch usmore interesting lower bounds.

1.3 Restricted Branching Programs

Until we mention otherwise we shall be talking about deterministic branching programs for the time being. A pathin a branching program is inconsistent if it contains two contradicting queries xi = 0 and xi = 1. If the branchingprogram can be arranged in levels with edges from one level going to the next, the width of the program is the numberof nodes in a largest level. We can make a branching program levelled by adding new nodes while keeping its lengthsame and at most squaring the size of the BP in the process. A levelled branching program is oblivious if at each levelall the nodes query the same variable.

1.3.1 Width Restricted Branching Programs

One of the early aspects that researchers of branching programs were curious about was how the restriction of constantwidth effects the power of branching programs. First note that any boolean function can be computed by a width-3branching program. So minimum size(or length) required of a width-w branching program to compute a booleanfunction fn is a well defined notion for w ≥ 3 but, no large lower bound on the size of width-w BPs for polytimecomputable functions is known. If one thinks of computation as a sequential process one might imagine that whenfn depends on a lot of its inputs, restricting width might mean large size might be necessary. Maybe in this spirit itwas conjectured that Majority cannot be computed by BPs of constant width and polynomial size. Refuting this, itwas shown by Barrington [7] that width 5 branching programs are in fact equivalent to NC1. The branching programsconstructed by Barrington are highly non-sequential and it helps to think of them as performing the job of separating

the inputs that map to 1 and 0 rather than computing them in the conventional sense. We give a very short summary ofhis cornerstone result.

A permutation branching program is an oblivious width-w branching program where between any two levels the0 edges and 1 edges form permutations. The w 0-edges permute from [w] nodes in one level to [w] nodes in the nextlevel and similarly the w 1-edges.

We say that permutation branching program P σ computes f if for every input x

P (x) =

σ, if f(x) = 1,

e, iff(x) = 0.e being the identity permutationWe shall use only cyclic permutations σ and the following fact shall prove very useful.

Fact 1.3.1. There exist cyclic permutations such that their commutator is a cyclic permutation as well. Consider γ =

(1 2 3 4 5), δ = (1 3 5 4 2). Their commutator γδγ−1δ−1 = (12345)(13542)(54321)(24531) = (13254).

5

Let σ, τ, δ be cyclic permutations. We have the following three lemmas

Lemma 6. Changing OutputIf P σ-computes f then there is a permuting branching program of the same size τ -computing f .

Proof. Since σ, τ are cyclic permutations there exists a permutation θ such that σ = θτθ−1. So we can simply leftcompose the permutation computed in the first level (both the 0-edge and 1-edge permutations) by θ and right composethe permutation computed in last layer with θ−1. The resulting program τ -computes f .

Lemma 7. Negating OutputIf P σ-computes f then there is a permuting branching program of the same size σ-computing ¬f .

Proof. Right composing the last level of P with σ−1 one can obtain permutation BP that σ−1-computes ¬f .The resultthen follows by the use of previous lemma.

Lemma 8. Computing ANDIf P σ-computes f and Q τ -computes g, then there is a permuting branching program στσ−1τ−1-computing f ∧ g ofsize 2(size(P ) + size(Q)).

Proof. By 6 there exist permutation BPs R and S that σ−1-compute f and τ−1-compute g respectively. Now composeP,Q,R and S in that order to get the necessary permutation BP T that στσ−1τ−1-computes f ∧ g. Note that whenf(x) = 0 P and R compute e making T compute e as well.

From fact.1.3.1 for w = 5 there exist cyclic permutations γ and δ such that their commutator γδγ−1δ−1 is cyclicas well.

Theorem 9. Barrington’s Theorem [7]Suppose that a boolean function f be computed by a DeMorgan circuit of depth d. Then f is also computable by awidth-5 branching program of length at most 4d.

Proof. The proof follows by induction on the depth of the DeMorgan circuit. Since negation doesn’t effect the size ofBP required assume f = g ∧ h, where g, h are De-Morgan circuits of depth at most d − 1. By induction hypothesisand 6 they have BPs that γ and δ-compute f and g respectively and are of size at most 4d−1. Note that γ and δ arethe cyclic permutations from fact 1.3.1 with a cyclic commutator. By lemma 1.3.1 followed by an application of 6 onecan construct a BP T of size at most 2(4d−1 + 4d−1) = 4d that σ-computes f for any cyclic 5-permutation σ.

1.3.2 Bounded Read

One of the conceivable ways to restrict the power of a branching program is to place a restriction how many timesany or some part of the input can be seen. Bounded replication restricts the number of variables that are read morethan once along a computation path. Replication number is the minimal number R such that the number of variablesthat are read more than once along a computation path is at most R. When R=0 the resulting branching programs arecalled read once programs. Read once BPs can be seen as a generalization of decision trees. While in a decision treewe count size to be the number of subtrees in it, in a read once program we count number of non-isomorphic subtreesas the size. Intuitively one expects that the number of non-isomorphic subtrees is large if f has many distinct sub-functions. This idea usually forms the basis of proof showing an exponential lower bound on deterministic read once

6

BPs computing some well chosen function. Exponential lower bounds for many polytime computable functions havelong been known for read once deterministic branching programs [2,5,53,56,62]. Gal [29] gave an exponential lowerbound for deterministic read-once branching program computing a function in AC0. The function involves determiningif a given input describing a set of points in a finite projective plane constitutes a blocking set of the projectiveplane. In the case of non-deterministic branching programs, one has a bit more liberty on how one can impose readonce constraint. The liberty comes from the fact that one can impose the read once constraint in a syntactic manneralong any source to sink path or a semantic manner along only consistent paths in the non-deterministic branchingprogram.(unlike in the case of deterministic BPs)

Definition 10. Non-deterministic read once or syntactic read once BPs. (1-NBP)A non-deterministic BP is called syntactic read once if along any path from source to sink any variable appears at mostonce.

Functions f demonstrating separation of P and NP ∩ coNP under read-once restriction are known i.e deterministicread once branching programs solving such f are proven to require exponential size while there exist syntactic readonce NBPs of polynomial size solving f and ¬f . Note that the read once restriction imposed in the above senseprevents the program from containing any inconsistent path. One can allow for a more general non-deterministicread-once model where one requires that the read once restriction be satisfied only along consistent paths. It turns outrelaxing the restriction on 1-NBPs and allowing inconsistent paths with multiple reads can make the model exponen-tially more powerful.

Definition 11. Weakly read once BPs or Semantic Read Once Non-deterministic branching programsA non-deterministic branching program is weakly read-once if along any consistent source to sink path no variable isread more than once.

Separation of weakly read once or Semantic read once NBPs from syntactic read once BPs (1-NBP)

The Exact Perfect matching function(EPMn) accepts a graph G iff its a perfect matching. EPMn takes as input ann × n boolean matrix and outputs 1 iff its permutation matrix. A matrix is a permutation matrix iff each row andcolumn has exactly one 1. The following observation is due to Jukna and Razborov [40].

Theorem 12. Every 1-NBP computing EPMn must have size 2Ω(n).

Proof. Consider a state s in a 1-NBP solving EPMn. Let I,J be two accepting computation paths passing through s.Note that EPMn is a function sensitive to every bit in its input. So both these accepting paths query all the matrixentries and do so exactly once along I,J since we are in a 1-NBP.Also note that if I = (Iin, Iout) and J = (Jin, Jout)

where Iin and Iout denote the incoming and outgoing portion of the path I , the path P = (Iin, Iout) should also beread once. So Iin and Iout should be disjoint as well. Since P is an accepting path and EPMn is a sensitive function,it has to be the case that the matrix entries queried in Iin and Jin should be the same and likewise for Iout and Jout.

Any accepting matrix instance has exactly n 1s along a computation path.Say the number of 1s observed along Ibefore reaching s is t and the number of 1s from s onwards is n − t. Since P is accepting it has to be the case thatthis numbers are the same for J as well. Consider any outgoing path OP consistent with Iin, since Iin is a matchingof t edges that constitutes a permutation matrix on a subset of rows and columns the remaining n− t entries can forma matching on the remaining n− t rows and columns. There are at most (n− t)! such outgoing paths OP consistentwith Iin. Likewise for any outgoing path Iout from the state s there are at most t! consistent incoming paths. The

7

Start 1

x11

x12

x13

x21

x22

x23

x31

x32

x33

¯x11 ¯x21

¯x21 ¯x31

¯x31 ¯x11

¯x12 ¯x22

¯x22 ¯x32

¯x32 ¯x12

¯x13 ¯x23

¯x23 ¯x33

¯x33 ¯x13

Figure 1.1: An O(n3) semantic read once branching program solving Exact Perfect Matching. Note that consistentpaths like the one in red are all read once.

number of perfect matchings that can be dealt with by a state when there are t 1s on incoming paths and n − t 1s onthe outgoing path is t!(n − t)!. So there are at least n

t!(n−t)! states at stage t of the 1-NBP. Taking t = n2 we get that

the number of states is at least 2Ω(n)

The following observation is due to Jukna [37].

Theorem 13. EPMn can be solved by a weakly read once NBP of size O(n3).

Proof. We can achieve this two parts. To test if a given input matrix X is a permutation matrix we verify

• If each row has at least one 1.

• If each column has at least n− 1 0s.

To do this using a weakly read once BP we guess which column in each row has a 1 and then we guess which of then − 1 row combinations in each column has all 0s. Observe that for any such pair of these guesses that turn out tobe true no part of the input is needed to be read more than once. To perform the first task we can use a NBP P1 thatcomputes the following formula

P1(X) =∧ni=1

∨nj=1 xi,j

For the second task we can use an NBP P2 that computes the following formula

P2(X) =∧ni=1

∨nk=1

∧ni=1,i6=k ¬xi,j

Compute the AND of these programs by connecting the sink of P1 to the source of P2. The size of the BP is O(n3)

and it is clearly weakly read once because all the labels in P1 are positive and all the labels appearing in P2 are negatedand so a path in the BP is either read-once or inconsistent.

Another direction in which one can relax syntactic read once NBPs (apart from weakly read once model we justsaw) is to consider syntactic read k models.

Definition 14. A nondeterministic branching program is syntactic read-k times program (read k-NBP) if along eachof its path from the source to a sink node each variable appears at most k times.

Exponential lower bounds are known for this model as well. One polytime computable function for which thisholds is the characteristic function of Bose-Chaudhury codes, a linear (n,m, d) code with m ≤ d log2(n + 1) [37].

8

Exponential lower bounds for k-NBP computing a different function(in NP) were shown by Borodin,Razborov andSmolensky [13].

Theorem 15. For every integer k ≥ 1 the characteristic function of BCH-codes of minimal distance d = Ω(n) requirek-NBP of size 2Ω(

√n).( [37] and Exercise 17.4 in [39])

Recall that in a BP with replication number R at most R nodes are read more than once along any computationpath. So R = n gives the unrestricted model. In what seems to be the most interesting result known for large R, Juknashowed the following in 2008 [38].

Theorem 16. Let Gn = (En, Vn),|Vn| = n be a sequence of Expander or Ramanujan graphs. Consider the booleanfunction

fGn(x1, x2, ..., xn) =∑

i,j∈En

xixj mod 2

Given a subset of vertices as input fG computes the parity of edges in En that lie entirely in this subset.Consider,

fn = fGn ∧ (x1 ⊕ x2 ⊕ ..⊕ xn ⊕ 1)

There exists a constant ε > 0 such that any deterministic branching program computing fn with a replication numberR ≤ εn requires size 2Ω(n).

The interested reader is suggested to refer to [39] for more on this.

1.3.3 Time Space tradeoffs

While placing a bounded read restriction indirectly imposes a bound on the time, time bounded branching programswhile being more general they themselves have an arguably more appealing motivation to be studied. Bounded timemodels require that for every input there is a consistent computation path of some bounded length say cn for someconstant c.

The state of the art time/space tradeoffs for deterministic branching programs were proven in the remarkable papersby Ajtai [1] and Beame-et-al [9]. In the first paper, Ajtai exhibited a polynomial-time computable Boolean functionsuch that any sub-exponential size deterministic branching program requires super-linear length. This result wassignificantly improved and extended by Beame-et-al who showed that any sub-exponential size randomized branchingprogram requires length Ω(n logn

log logn ).

Lower bounds for nondeterministic branching programs have been more difficult to obtain. Once again our em-phasis here is on the harder semantic model where the restriction is on the length of consistent paths in the branchingprogram. Obtaining an analog of Ajtai’s result for non-deterministic branching programs is still open. Moreover noexponential lower bounds are known even when the time is restricted to T = n the number of input bits.

Open Problem 17. Prove an exponential lower bound on the size of non-deterministic boolean branching programs ofn variables all of whose consistent paths have length at most n.

Note the above open problem would be identical to the problem of showing lower bounds for weakly read-once orsemantic read once non-deterministic branching programs when the function involved is sensitive (i.e any two acceptedinputs differ in at least two positions.) Since, if the function is sensitive, length at most n for all consistent paths

9

implies each variable is queried exactly once. (If a certain variable doesn’t appear along some accepting computationpath you can flip that variable alone to get an accepting input.) We make some progress towards this problem inchapter 2 by showing an exponential lower bound for a polytime computable ternary function, f : [D]n → 0, 1where D = 0, 1, 2.

The best lower bound known prior to our work in chapter 2 is an exponential lower bound(due to Jukna) forsemantic read-once (nondeterministic) |D|-way branching programs, where |D| = 213 [36]. In fact this lower boundactually holds more generally for semantic read-k but where |D| grows exponentially with k as 23k+10. Jukna’s resultis an improvement over exponential lower bounds with a domain requirement of 22ck obtained by Beame, Jayram andSaks [8]. The interested reader can find more about their results in chapter 2.

1.4 Function Composition and The Tree Evaluation Problem

1.4.1 The KRW conjecture: Understanding composition as a way of separating complexityclasses

In 1995 Karchmer, Raz and Wigderson [42] proposed a line of attack to prove super-logarithmic lower bound on thedepth of a circuit required to compute a specific function in P and thus separate NC1 from P. The specific functioncalled Iterated Multiplexor (coined in [25]) can be described as follows. The function can be described using twoparameters d and h. The input to the function is a complete ordered d-ary tree of height h with root thought as beingat height h. Each of the dh−1 leaves are labelled by an input bit, giving their value. The internal nodes are labelled bya string from 0, 12d describing a boolean function on d bits. This labelling induces in a natural way a correspondingbit value at every node in the tree. The value of a leaf is its label, the value of an internal node u is the value thefunction at that node, fu takes for the bit values b1, b2, .., bd realized by its children, fu(b1, b2, .., bd). The output ofIterated Multiplexor is the value of the root. The input is of length dh−1 + 2d

∑h−2i=1 d

i−1, dh−1 bits for the leaf labelsand the rest for describing functions at internal nodes.

Suppose now that the internal functions in an Iterated Multiplexor are fixed and also for each level the functionsat that level are all identical. Then Iterated Multiplexor becomes a problem of evaluating a composition of functions.For example when h = 3 it becomes a three fold composition of functions, lets say f1, f2 and f3. The root evaluatestof1(f2(f3(x111, x112, ..x11d), ..f3(x1d1, x1d2, .., x1dd)), .., .., f2(f3(xd11, xd12, ..xd1d), ..f3(xdd1, xdd2, .., xddd)))). Inorder to prove an super-logarithmic depth lower bound for the Iterated Multiplexor, it suffices to prove that the circuitdepth of the composition of functions grows in proportion to the sum of depth of the functions. If the functions are ran-domly chosen d bit functions then they require depth d− o(d). If there are h = logn

log logn levels of these and d = log n,the necessary depth is Ω(log2 n/ log log n) thus placing it out of NC1. Karchmer and Wigderson propose an alternatecharacterization of depth of a circuit by building an isomorphism between boolean circuits computing the function fand communication protocols for a certain search problem Rf associated with the function f . This isomorphism hasthe nice property that the depth of the circuit and number of bits exchanged in the corresponding protocol are equal. Asa result, if one can prove communication sufficiently large lower bounds against the search problem associated withthe composed functions it provides the equivalent of proving that the circuit depth of the composition of functionsgrows with the sum of depth of the functions involved.

We shall stick to this idea of exploring function composition but instead of circuits we focus on understanding

10

Figure 1.2: This figure illustrates a black pebbling of T 3 using 3 pebbles.

how branching program complexity behaves as you compose functions. In order to do this, we use the tree evaluationproblem introduced by Cook et al. in [19] as a candidate problem in P against which super logarithmic space lowerbounds might be potentially shown. The definition of the problem is inspired by Michael Teitslin’s [59] FOCS 2005submission, an attempt to separate NL and P.

1.4.2 The Tree Evaluation Problem

Let Thd be the balanced rooted complete d-ary tree of height h, i.e having h levels.Let [k] = 0, 1, ..., k − 1. Wenumber the nodes of Thd using heap numbering. The root is numbered one and in general the children of node i arenumbered from di+ 2− d to di+ 1.

Definition 18. In the Tree Evaluation Problem(TEP) [19] on Thd the input is a table with kd entries defining a functionfi : [k]d → [k] for each internal node in Thd and a number in [k] for each leaf in Thd . The two versions of TEP whichare of interest to us are

• FThd (k): For a given input, find the value of the root of Thd .

• BThd (k): In this boolean problem for a given input we need to determine whether the value of the root is 1.

One can easily see that TEP is in P. In fact it is known that TEP ∈ LogCFL [19]. We wish to show thatTEP /∈ L (and TEP /∈ NL). Natural ways to solve the tree evaluation problem can be described using what we callpebbling algorithms. We use them to get our upper bounds for the tree evaluation problem.

Definition 19 (Black Pebbling). The legal moves in a black pebbling game are as follows.

• Place a pebble on any leaf.

• If all the children of node i are pebbled, slide one of these pebbles to node i.

• Remove a pebble at any time.

The goal in a black pebbling game is to place a pebble on the root. Figure 1.2 gives an example of a black pebblingprocedure to pebble T 3

2 using 3 pebbles.

11

Theorem 20. (d− 1)h pebbles are necessary and sufficient to black pebble Thd

Proof. The upper bound is easily proved by induction on h. Pebble the leftmost sub-tree of the root using (d−1)(h−1)

pebbles and leaving a single pebble at the root of the subtree proceed to pebble the next sub-tree. When all the dsubtrees are pebbled, pebble the root. For the lower bound observe that during the process of pebbling there will bean instance when for the first time the paths from root to all leaves are blocked by pebbles. This forms a bottleneckpebbling configuration with at least h pebbles.

For the rest of this introductory section we shall fix d = 2, that is we talk about TEP on binary trees Th2 .

1.4.3 k-way Branching Programs Solving Tree Evaluation Problem

Definition 21 (k-way Branching Programs). A k-way branching program solving FTh(k) is a directed acyclic multi-graph B with one source node γ0 (the start state) and k sink nodes (the output states) labeled 0, 1, ..., k − 1. Eachnon-output state γ has a label < i, a, b >, where i is a node in Th and a, b ∈ [k] (if i is a leaf in Th, then a,b, aremissing). (The intention is that state gamma queries the function fi(a, b) or the value of leaf node.) The state γ has kout edges labeled 0, 1, ..., k−1) (multiple edges can go to the same node). The computation of B on input I (describingan instance of FTh(k)) is a path in B, starting with the start state γ0, and ending in an output state, where for eachnon-output state γ querying fi(a, b) = c, the next edge in the path is the one labeled with c.

It is easy to see that black pebbling can be implemented by a k-way branching program with O(kh) states, con-sequently FTh(k) ∈ DSPACE(h log k). The input size of a TEP instance is n ≈ (2h − 1)k2 log k =⇒ log n ≈h+ log k.(So black pebbling isn’t a log space algorithm.) Observe that by theorem(3) it follows that to show L 6= P itsuffices to show that any k-way BP solving FTh(k) requires Ω(kch) states for some unbounded sequence ch. Giventhe significance of the observation we shall state this as a lemma.

Lemma 22. To show that L 6= P it suffices to show that any k-way branching program solving TEPh2 (k) requiresΩ(kch) states for some unbounded sequence ch in the height h.

Proof. Observe that a general TEP problem instance has input size n = (2hk2 log k) bits, so log n = (h+ log k). Letch be any unbounded sequence, indexed by h. Suppose we can show that any k-way BP solving FTh(k) requiresat least Ω(kch) states as a function of k. Then the space required is ch log k = ω(log n) if we take h = O(log k), ask →∞.

While it has been shown [19] that for h = 2, 3 the respective k-way BPs have size at least k2 and k3 respectively(that is, no better than suggested by black pebbling upper bound), height 4 and beyond are wide open. We make someprogress towards this for height h = 4 in chapter 4 by showing a k3.5 lower bound. In the process we manage toimprove on the Nechiporuk’s method in the context of lower bound achievable against k-way branching programs.

Given this, it might be interesting to study restricted branching programs solving composition and show thattheir size is indeed required to be as large and grow with the number of compositions h as desired in lemma 22. Adeterministic read-once k-way branching program is defined as one in which no input variable is queried more thanonce along any path in it. The following lower bound is known in such a model.

Theorem 23 (James Cook/Siu Man Chan). [Available on Stephen Cook’s webpage 1]Any deterministic read-once branching program solving FTh2 has Ω

(kh−1

)states which query the leaves.

1In the manuscript titled “New Results for Tree Evaluation” at http://www.cs.toronto.edu/˜sacook/

12

http://www.cs.toronto.edu/~sacook/

We now define non-deterministic k-way BPs solving TEP.

Definition 24 (Nondeterministic Branching Program). A nondeterministic k-way branching program solving FTh isa directed rooted multi-graph with one source node γ0 (the start state) and k sink nodes (the output states) labeled0, 1, ..., k − 1. Each non-output state gamma has a label < i, a, b >, where i is a node in Th and a, b ∈ [k] (if iis a leaf in Th, then a,b, are missing). (The intention is that state gamma queries the function fi(a, b).) The state γhas out edges with label in [k] (multiple edges can go to the same node and some entries in [k] may not be used aslabels). A computation on input I (describing an instance of FTh(k)) is a path in B, starting with the start state γ0 andproceeding such that for each non-output state γ querying fi(a, b) = c (or a leaf v = c), the next edge in the path isany edge labeled c. A computation path on input I either ends in a final state labeled FTh(I) or it ends in a non-finalstate labeled querying fi(a, b) = c (or a leaf v = c) with no out-edge labeled c (In this case we say the computationaborts). For every input I at least one such computation must end in a final state.

The size of such non-deterministic k-way BPs is the number of nodes in them.2

Just the way black pebbling helped us describe upper bounds for deterministic branching programs, a notion calledblack-white pebbling [19] provides a way to describe non-deterministic branching program upper bounds for solvingTEPh2 .

1.4.4 NBP Upper-bounds via pebbling schemes

In this section we describe a pebbling scheme that corresponds to non-deterministic read once BPs solving TEP. Whilethey directly fetch us branching program size upper bounds, pebbling schemes might also give us some insight intoproving size lower bounds.

Here we define what we call as Black/White pebbling to describe one of the ways to obtain the upper bounds fortree evaluation problem in the non-deterministic setting.

Definition 25 (Black/White pebbling). The legal moves in a black/white pebbling game are as follows

• A white pebble can be placed at any node at any time.

• A white pebble can be removed if the node is a leaf or both its children have pebbles.

• A black pebble can be placed at any leaf.

• If both children of node i are pebbled, place a black pebble at i and remove any black pebbles at the children.

• Remove a black pebble at any time.

The goal of a black/white pebbling scheme is to start and end with no pebbles but to have a pebble at the rootat some time. The minimum number of black/white pebbles needed, the black-white pebbling number for Th isdh/2e+ 1 [19].Figure 1.3 describes how T 4 can be black/white pebbled with 3 pebbles.

Corollary 26. A non-deterministic BP can solve BTh with O(kdh/2e+1) states.

2Note that unlike here, at other places in the literature an alternative definition in use is to have NBPs with edges that are labelled by literals andthe size will be measured as number of edges. One can see that a BP computing a function in one model can be transformed to a BP computing thesame function in the other model by incurring only a factor of cost that is polynomial in the size of the BP one starts with.

13

.

Figure 1.3: This figure shows a black/white pebbling of T 4 using 3 pebbles. We start with pebbling the root of leftsubtree

14

One of the problems we consider is to prove lower bounds against restricted branching programs solving functioncomposition or the tree evaluation problem. Recall that restricted Branching programs in the non-deterministic settingwith a bound on number of reads come in two variations. Syntactic read once k-way NBPs are those in which no inputis read more than once along any computation path. Weakly read once or Semantic read once k-way NBPs are those inwhich the read once restriction is only along consistent source to sink paths. As mentioned earlier the semantic modelis strictly stronger than syntactic.

The following theorem due to David Liu gives the desired lower bound on syntactic read once non-deterministicbranching programs solving TEP.

Theorem 27 (David Liu). [49] Any syntactic read once k-way non-deterministic branching program solving TEPh2has at least (k − 1)

dh2 e+1 states.

In chapter 3 we shall prove that in the more challenging model of weakly read once or semantic read once NBPsas well, solving TEPh3 would need at least Ω(kh) states. (showing this for TEPh3 instead of TEPh2 just makes theargument easy to describe and note that the black-white pebbling number of a 3-ary tree of height h is≈ h.) Our prooftechnique is different and more involved than that of the proof arguments of James Cook/Siu Man and David Liu for theweaker models of deterministic read once and non-deterministic syntactic read once branching programs respectively.In this context, we would like to mention that although we expect that the lower bound we show in chapter 3 holdswhen the internal functions in the tree are fixed like in the proof of theorem 23 we do not seem to know how to exploitthe algebraic properties of polynomials obtained as a consequence of composing as simple looking polynomials asx3 + y3 to our benefit in the non-deterministic read once setting.

Nevertheless, it should be interesting to know that when the functions in tree evaluation problem are fixed tosome constant degree polynomials there are in fact surprisingly space efficient algorithms if we remove the readonce restriction. A result of Ben-Or and Cleve [12] shows that over any arbitrary ring the functions computed bypolynomial size algebraic formulas are also computed by polynomial length algebraic straightline programs that useonly 3 registers. This can be seen as an extension of Barrington’s result since boolean formulas are equivalent toalgebraic formulas over the ring GF (2).

Theorem 28 (Ben-Or and Cleve). [12] If f is an algebraic formula of depth d over a ring, there exists a 3 registerstraight-line program of length O(4d) that computes f .

It follows from this that if the internal functions of TEP are fixed to some constant degree polynomials there existsa layered branching program of some L(h) layers and O(k3) states per layer that solves FTh2 . The k3 states perlayer correspond to the different possible value configurations the registers can be in. Just like Barrington’s branchingprograms uses multiple reads as in lemma (1.3.1) for computing AND these programs perform multiple reads whilethey compute the PRODUCT of two formulas. Note that the number of layers required is independent of k andconsequently for a fixed h, the size amounts to a surprisingly small O(k3) !

1.5 Outline

The outline for the following chapters is as follows.

• In chapter 2 we prove exponential lower bounds on the size of semantic read-once 3-ary nondeterministicbranching programs solving a polytime computable function.

15

The content of this chapter is joint work with Stephen Cook, Jeff Edmonds and Tonian Pitassi.

• In chapter 3 we focus on proving lower bounds against restricted branching programs solving function compo-sition. In particular, we show that weakly read once or semantic read once NBPs solving the tree evaluationproblem(TEPh3 ) would need at least Ω(kh) states.

The content of this chapter is joint work with Jeff Edmonds and Tonian Pitassi resulting from a question posedby Stephen Cook.

• In chapter 4 we focus entirely on unrestricted or general branching programs.We give a better lower bound than is possible by using Nechiporuk’s method for k-way branching programssolving TEP 4

d . Using essentially the same method we give a matching lower bound to that achievable byusing Nechiporuk’s method for binary branching programs. The interesting aspect of this method is that itseems plausible to improve on one of the parts of the argument. Any marginal improvement here would beconsequential towards beating Nechiporuk’s method for binary branching programs. We then proceed to givesome surprising branching programs that are different from naive upper bounds and are based on some surprisingcommunication complexity protocols. Our aim here is to improve our understanding of a possible approach toprove the suspected lower bound but the upper bounds and the connections to communication complexity thereinmight themselves be of independent interest.

The content of this chapter is joint work with Jeff Edmonds.

16

Chapter 2

Lower Bound for Ternary Functions

17

2.1 Introduction

One approach to explore the problem of whether polynomial-time is the same as log-space or nondeterministic log-space is to study time/space tradeoffs for problems in P . For example, for natural problems in P , does the additionof a space restriction prevent a polynomial time solution? In the uniform setting, time-space tradeoffs for SAT wereachieved in a series of papers [26–28, 48]. Fortnow-Lipton-Viglas-Van Melkebeek [28] shows that any algorithm forSAT running in space no(1) requires time at least Ω(nφ−ε) where φ is the golden ratio ((

√5 + 1)/2) and ε > 0.

Subsequent works [23, 63] improved the time lower bound to greater than n1.759.

In the nonuniform setting, the standard model for studying time/space tradeoffs is the branching program. In thismodel we described in chapter 1, the length of the branching program is the number of edges in the longest path andcan be seen as a measure of computation time. The size of a branching program is the number of nodes in the program.For a boolean function fn of n variables, let BP (fn) denote the minimum size of a branching program computing fn.As discussed in chapter 1, BP (fn) is tightly related to the space complexity S(fn) of a non-uniform Turing machinecomputing fn. This motivates the study of branching program size lower bounds. In particular, size lower bounds onlength restricted branching programs translate to time-space tradeoffs.

The state of the art time/space tradeoffs for branching programs were proven in the remarkable papers by Ajtai [1]and Beame-et-al [9]. In the first paper, Ajtai exhibited a polynomial-time computable Boolean function such thatany sub-exponential size deterministic branching program requires superlinear length. This result was significantlyimproved and extended by Beame-et-al who showed that any sub-exponential size randomized branching programrequires length Ω(n logn

log logn ).

Lower bounds for nondeterministic branching programs have been more difficult to obtain. In this model, therecan be several arcs (or no arcs) out of a node with the same value for the variable associated with the node. An inputis accepted if there exists at least one path consistent with the input from the source to the 1-node. A nondeterministicbranching program computes a function f if its accepted inputs are exactly equal to f−1(1). From here on, we shallrestrict our attention to non-deterministic branching programs.

Length-restricted nondeterministic branching programs come in two flavors: syntactic and semantic. A length lsyntactic model requires that every path in the branching program has length at most l, and similarly a read-k syntacticmodel requires that every path in the branching program reads every variable at most k times. In the less restrictedsemantic model, the requirement is only for consistent accepting paths from the source to the 1-node; that is, acceptingpaths along which no two tests xi = d1 and xi = d2, d1 6= d2 are made. This is equivalent to requiring that for everyaccepting path, each variable is read at most k times. Thus for a nondeterminsitic read-k semantic branching program,the overall length of the program can be unbounded.

Note that any syntactic read-once branching program is also a semantic read-once branching program, but theopposite direction does not hold. In fact, Jukna [37] proved that semantic read-once branching programs are exponen-tially more powerful than syntactic read-once branching programs, via the “Exact Perfect Matching”(EPM) problem.The input is a (Boolean) matrix A, and A is accepted if and only if every row and column of A has exactly one 1 andrest of the entries are 0’s i.e if it’s a permutation matrix. Jukna gave a polynomial-size semantic read-once branchingprogram for EPM, while it was known that syntactic read-once branching programs require exponential size [41, 45].

Lower bounds for syntactic read-k (nondeterministic) branching programs have been known for some time [13,52].However, for semantic nondeterministic branching programs, even for read-once, no lower bounds are known forpolynomial time computable functions for the |D| = 2 case. The best lower bound known prior to our work is an

18

exponential lower bound for semantic read-once (nondeterministic) |D|-way branching programs, where |D| = 213

[36]. In fact this lower bound actually holds more generally for semantic read-k but where |D| = 23k+10.

Jukna obtains his result by showing that any time restricted semantic branching program of small size has a largerectangle in f−1(1). He uses the polytime computable function of computing the characteristic function of a linearcode having minimum distance m + 1 defined over GF (q). Given a parity matrix Y ,the function g(Y, x) = 1 iffx is a codeword. Since codewords in a linear code of minimum distance m + 1 can only have an m-rectangle ofsize 1 he argues that a time restricted branching program of length kn computing g requires a size of 2Ω(n/k24k).This exponential lower bound can be obtained whenever D is sufficiently large in comparison to k, specifically for|D| = q ≥ 23k+10.

Jukna’s result is an improvement over exponential lower bounds with a domain requirement of 22ck obtained in [8].Beame et.al [8] obtain their result by characterizing the function computed by a time restricted branching program ofsmall size as a union of shallow decision forests where the size of the union depends on the size of the branchingprogram. Each shallow forest is then shown to be representable by a collection of small number of βn-pseudo-rectangles in f−1(1). (Pseudo-rectangles are a generalization of what we call embedded rectangles later). This givesa representation of the branching program as a union of small (in the size ‘s’) number of βn-pseudo-rectangles. Now,if for some function f the maximum size of a βn-pseudo-rectangle is |D|(1−ψf (β))n and the number of yes-instances|f−1(1)| ≥ |D|(1−η(f))n then the number of βn-pseudo-rectangles will be at least |D|(ψf (β)−η(f))n. This yields anexponential lower bound on s for sufficiently large |D| whenever (ψf (β) − η(f)) is bounded away from 0 by someε > 0. They then exhibit a polytime computable function with this property. Their function QFM : GF (qn)→ 0, 1is based on quadratic forms using a modified Generalized Fourier Transform matrix. They show that there exists aconstant c > 0 such that for all k and ε ∈ (0, 1), if D ≥ 22

cεk

then a non-deterministic BP of length kn computingQFM needs size at least S = 2n log1−ε |D|. For the specific case of k = 1, it can be shown that if their analysis ofmaximum size of βn-pseudo-rectangles in QFM is tight, a domain size of at least |D| ≥ 264 is needed.

Our main result is an exponential lower bound on the size of semantic read-once nondeterministic branchingprograms for a polynomial time decision problem f for 3-ary inputs. Similar in spirit to these previous results [8, 36]we show that a small sized semantic read once branching program is bound to have a large rectangle in f−1(1).

(?) However in addition, we show that one can always find a balanced rectangle in f−1(1) of size r2 where r issome large constant.

A balanced rectangle is one which is reasonably close to being a square.

The particular polynomial time decision problem we use to prove the lower bound is: to decide if a polynomial overa finite field K evaluates to a value less than a certain threshold at a given input. The input is a pair (u, x) where u isthe description of a degree d−1 polynomial over [K] and x ∈ [K], and we want to accept if and only if u(x) < K1−δ .We actually prove a stronger theorem: with high probability over all polynomials u, any nondeterministic semanticread-once branching program for what we shall call Polyu (along with a hyperplane constraint) requires exponentialsize. That is, even if the branching program knows the polynomial u, for a typical u it cannot efficiently do polynomialevaluation. The main properties of polynomials over finite fields we are using are polynomial interpolation, and lemma2.3.7, which might be interpreted to mean something like: the spread of values of a typical random polynomial ofdegree d over a field K is roughly close to being uniform over K, provided K is sufficiently large.

Continuing with the above observation (?) that we can find a balanced rectangle in f−1(1) for a function with a

19

small semantic read once branching program, since the number of balanced rectangles of a certain size d = r2 is small

and since each one of them can be a rectangle in f−1(1) for a relatively small number of degree d polynomials overK as a consequence of polynomial interpolation, we argue that there must be a polynomial with no balanced rectangleof this size in f−1(1) and hence the branching program computing it should be large. A key idea of this argument isthat for a balanced rectangle the sum of the lengths of the rectangle can be at most a small fraction of its area.

By a simple padding argument, we can modify our problem Polyu and actually achieve the lower bound fordomain size 2 + ε for arbitrarily small ε > 0. In this model, we can define the problem to have N = n+M variables,M = Θ(N) of them with domain size 3 and the rest, with domain size 2, do not affect the output. In section 5, we showwhy it might be harder to prove lower bounds for semantic read-once branching programs when |D| = 2 by showinghow these branching programs can altogether evade having an exponential number of states in many purported choicesof bottleneck layer by giving polynomial upper bounds.

2.2 Definitions

Throughout this article, D denotes a finite set. For finite set N , DN is the set of maps from N to D. An element of Nis called a variable index or simply an index. We normally takeN to be [n] for some integer n, and writeDN forD[n].If A ⊆ N , a point σ ∈ DA is a partial input on A. For a partial input σ, fixed(σ) denotes the index set A on which itis defined and unfixed(σ) denote the set N − A. If σ and π are partial inputs with fixed(σ) ∩ fixed(π) = ∅, thenσπ denote the partial input on fixed(σ) ∪ fixed(π) that agrees with σ on fixed(σ) and with π on fixed(π).

For x ∈ DN and A ⊆ N , the projection xA of x onto A is the partial input on A that agrees with x. For S ⊆ DN ,SA = xA | x ∈ S.

Lets recall the following definition from the first chapter.

2.2.1 Nondeterministic Semantic Read-Once Branching Programs

Let f : DN → 0, 1 be a boolean function whose input is given in |D|-ary. Let the input variables be x1, . . . , xn

where xi ∈ D for all i ≤ n. A |D|-way nondeterministic branching program (for f ) is an acyclic directed graph Gwith a distinguished source node qstart and a distinguished sink node (the accept node) qaccept. We refer to nodes asstates. Each non-sink state is labeled with some input variable xi, and each edge directed out of a state is labeled withsome value b ∈ D for xi. For each ~ξ ∈ DN , the branching program accepts ~ξ if and only if there exists at least one(directed) path starting at the qstart and leading to the accepting state qaccept, and such that all labels along this pathare consistent with ~ξ. The size of a branching program is the number states (i.e. nodes) in the graph.

A branching program is semantic read-k if for every path from qstart to qaccept that is consistent with some input,each variable occurs at most k times along the path. In particular, for the read-once case, a semantic branching programallows variables to be read more than once, but each accepting path may only query each variable at most once.

2.2.2 Polynomial Evaluation Problem

Our hard computational problem is the polynomial evaluation problem, Poly, with parameters K, d, δ, where 0 <

δ < 1. The input is a pair (u, x) where u ∈ [K]d specifies a degree d − 1 polynomial over the field [K] (K a primepower), and x ∈ [K] specifies a field value. Poly(u, x) = 1 if and only if the polynomial specified by u on input x

20

evaluates to a number less than K1−δ . (We compare two field elements by comparing them using the natural orderingon ternary strings.)

We will work with |D|-ary branching programs (with |D| prime), and let K = |D|n. The input will be given as avector in D(d+1)n. The first dn coordinates specify u and the last n coordinates specify x. Thus the total input lengthis (d+ 1)n. In the remainder of this chapter, |D| = 3, and thus the parameters of Poly are d, δ, n. Both d and δ willbe fixed constants. Let Polyu denote the polynomial evaluation problem with parameters d, δ, n where the polynomialu is fixed.

The actual lower bounds we show will be for a sensitive function fu obtained from Polyu as follows. Let a ∈GF (q) where q = |D| is a prime number. Let h : Dn → 0, 1 be the characteristic function of the hyperplane at a:

ha(x) = 1 iff x1 + x2 + ...+ xn = a mod q

Fix an element a(u) for which ha accepts the largest number of vectors accepted by Polyu and define the function

fu(x) = Polyu(x) ∧ ha(u)(x)

We call fu sensitive because it has the property that changing the value of exactly one variable in a yes input alwaysgives an input vector that is a no instance. As a result any two accepted inputs differ in the value of at least twovariables. Similarly for the polynomial evaluation problem Poly, where the coefficient vector u is part of the input,we define f(u, x) = Poly(u, x) ∧ ha(u)(x), which is sensitive in x.

2.2.3 Rectangles and Embedded Rectangles

We use the same definitions and conventions as in [9]. A product U × V is called a (combinatorial) rectangle. IfA ⊆ N is an index subset and Y ⊆ DA and Z ⊆ DN−A, then the product set Y × Z is naturally identified with thesubset R = σρ | σ ∈ Y, ρ ∈ Z of DN , and a set of this form is called a rectangle in DN .

An embedded rectangle R in DN is a triple (πred, πwhite, C) where πred, πwhite are disjoint subsets of N , andC ⊆ DN satisfies: (i) The projection CN−πred−πwhite consists of a single partial input w, (ii) if τ1 ∈ Cπred andτ2 ∈ Cπwhite , then the point τ1τ2w ∈ C. C is called the body of R. The sets πred, πwhite are called the feet of therectangle; the sets Cπred and Cπwhite are the legs, and w is the spine. We can also specify an embedded rectangle byits feet, legs and spine: (πred, πwhite, A,B,w) where πred, πwhite are the feet, A = Cπred , B = Cπwhite are the legs,and w is the spine.

We will sometimes refer to A as the red side of the rectangle and to B as the white side of the rectangle. The size

of the rectangle is |A| · |B|, and the dimension of the rectangle is mr-by-mw where mr = |πred| and mw = |πwhite|.

2.3 Lower Bound for |D| = 3

Theorem 2.3.1. There exists constants d, δ such that for sufficiently large n, for a random u, with probability greater

than 1/4, any 3-ary nondeterministic semantic read-once branching program for fu requires size at least 2Ω(n).

Corollary 2.3.2. There exists constants d, δ such that for sufficiently large n, any 3-ary nondeterministic semantic

read-once branching program for f(u, x) with parameters d, δ, n requires size at least 2Ω(n).

21

Overview of Proof Call a degree d− 1 polynomial “good” if the fraction of accepting instances is roughly what youwould expect from a random function; that is, the fraction of yes instances is at least 1

2K−δ . Lemma 2.3.7 shows that

at least half of all degree d− 1 polynomials are good.The main lemma (Lemma 2.3.3) shows that for all good polynomials Polyu to their corresponding sensitive

function fu, we can associate with every size s = 2o(n) branching program P computing fu, anmr-by-mw embeddedrectangle RP of size r2, where r will be a large constant, and mr and mw will be roughly equal, and will each be aconstant fraction of n. For simplicity of calculations for now, assume that mr = mw = m. The rectangle will havethe property that P accepts every input in RP ; in other words, RP is a 1-rectangle of P . Choosing d = r2, eachrectangle of size r2 can be a 1-rectangle for very few degree d − 1 polynomials – at most a |D|−nδr2

fraction of alldegree d − 1 polynomials. (This is Lemma 2.3.6.) Secondly, the total number of such rectangles is fairly small –of size roughly |D|O(rm) (Lemma 2.3.5). The key point is that the number of rectangles is roughly |D|2rm – theexponent grows linearly in r. (More precisely it grows linearly in the sum of the lengths of the sides of the rectangle,|A| + |B|). But on the other hand, the probability that a degree d = r2 polynomial takes on values less than K1−δ

within the rectangle is roughly |D|−mr2

– that is, the exponent grows quadratically with r. (More precisely it growslinearly in the size of the rectangle |A|.|B|). Because |D|−nδr2 |D|O(rn) is less than 1/4, this implies that many gooddegree d− 1 polynomials have no size r2 1-rectangle, thus proving the theorem.

Note that we set our parameters so that the area of the rectangle RP is at least the degree d of the polynomial u.(Thus r2 ≥ d.) A crucial point in the above argument is that the sum of the lengths of the sides of RP must be atmost a fraction of its area. This requires that the rectangle is reasonably close to being square. We put extra effort intomaking sure that the rectangle is square (without compromising too much of its size in order to make it square). Thisenables us to achieve domain size 3; a somewhat simpler argument achieves domain size 5.

Lemma 2.3.3. (Main Lemma) Let f : Dn → 0, 1 be any sensitive boolean function such that the density of 1’s is at

least 12|D|K

−δ . Suppose that the following inequalities are satisfied for our parameters: (1) mw = 4mr = γn;

(2) |D|mr ≤ |D|mw(1/2 − 2γ)mw ; (3) r ≤ 14|D|s (1/2 − γ)mr |D|mr−δn. Then if P is a |D|-way nondeter-

ministic semantic read-once branching program of size s for f then there is an mr-by-mw embedded rectangle

R = (πred, πwhite, A,B,w) such that every input in R is accepted by P , and where |A|, |B| = r.

Proof. Let f be a sensitive function such that the density of 1’s is at least 12|D|K

−δ . Suppose there is a size snondeterministic semantic read-once branching program, P for f . Let S0 be the set of inputs that are accepted by P;since P is assumed to be correct for all inputs of f , we have |S0| ≥ 1

2|D|K−δ|D|n. For each accepted instance I ∈ S0,

fix one accepting path, pI , in the branching program. Since the function is sensitive each of the n variables must beread along any accepting path. For if some variable is not read along a computation path then changing the value ofthat variable alone would produce an accepting instance. However, this can’t be the case for a senstive function sinceany two accepted inputs will have to differ in at least two positions. So each of the n variables must be read along thispath exactly once and thus each accepting instance I has an associated permutation πI of the n variables associatedwith its accepting path pI . Designate state qI as the state in pI which occurs just after the first half of the variables inπI . Now define q to be the most common designated state (over all accepting inputs I ∈ S0), and let S1 ⊆ S0 denotethe corresponding set of inputs whose designated state is q. Thus for each input I in S1, there is an accepting path pIthat passes through state q. Because P has size s, it follows that

|S1| ≥ |S0|/s ≥1

2|D|sK−δ|D|n =

1

2|D|s|D|n−δn (2.1)

22

We now want to pick two subsets of coordinates πred ⊆ N and πwhite ⊆ N , of size mr and mw respectively,and a set S∗ ⊆ S1 of inputs with the property that for every input I ∈ S∗, and associated accepting path pI , not onlydoes it pass through state q, but every coordinate in πred is read before state q, and every coordinate in πwhite read ator after state q. We will first pick πred greedily. For each I ∈ S1, at least n/2 of the n coordinates in pI occur in πIbefore reaching state q, and thus there is some coordinate i such that for at least half of the inputs I ∈ S1, i occursin πI before reaching state q. After choosing the first coordinate, there are at least |S1|/2 inputs remaining. Continuegreedily until we pick mr coordinates, πred, always choosing the most popular coordinate that occurs in πI beforereaching state q. By averaging, when the ith coordinate, i ≤ mr < γn is chosen, the fraction of inputs that remain isat least (n/2−i)

(n−i) ≥(n/2−γn)(n−γn) ≥

(n/2−γn)n = (1/2− γ). Let S2 ⊆ S1 denote the set of inputs such that all coordinates

in πred are read before reaching q. It follows that

|S2| ≥ (1/2− γ)mr |S1| (2.2)

By assumption (3) in the statement of the Lemma, we have

r ≤ 1

4|D|s(1/2− γ)mr |D|mr−δn (2.3)

Then from (2.1), (2.2), and (2.3) we have

|S2| ≥ 2r|D|n−mr (2.4)

For each w ∈ Dn−πred , the average number of extensions of w in S2 is 2r. We want to prune S2 such that everyw ∈ Dn−πred has at least r extensions. To do this, define S3 ⊆ S2, where we remove all inputs (w, ∗) from S2 suchthat w has less than r extensions in S2. Since we delete at most r|D|n−mr elements from S2, and |S2| ≥ 2r|D|n−mr ,it follows that

|S3| ≥ r|D|n−mr (2.5)

Next we will choose mw coordinates, πwhite in the same greedy fashion, and let S4 denote the set of all inputs inS3 such that all coordinates in πwhite are read after reaching q. Again by averaging,

|S4| ≥ (1/2− 2γ)mw |S3| (2.6)

We will express S4 as the disjoint union of sets Rw: choose a value w for the coordinates outside of πred ∪πwhite.The corresponding set Rw ⊆ S4 consists of all inputs (α,w, β) such that α is an assignment to the variables in πred,β is an assignment to the variables in πwhite, and (α,w, β) ∈ S4.

Lemma 2.3.4. For each w: (i) Rw is an embedded rectangle and (ii) as long as Rw is not empty, the size of its red leg

is at least r.

Proof. We will first show that Rw is an embedded rectangle. Let Sred ⊆ Dπred be the projection of Rw onto thecoordinates of πred and let Swhite ∈ Dπwhite be the projection of Rw onto the coordinates of πwhite. Setting A =

Sred,B = Swhite andw = w, we claim thatRw is equal to the embedded rectangle defined by (πred, πwhite, A,B,w).To see this, consider x, x′ ∈ A and y, y′ ∈ B such that xyw ∈ Rw, and x′y′w ∈ Rw. Let I be the input corresponding

23

qstart 1x,w

y,w

qstart 1

x′, w

y′, w

qstart 1x,w

x′, w y, w

y′, w

(x,w, y), (x′, w, y′) ∈ I =⇒ (x,w, y′), (x′, w, y) ∈ I

Figure 2.1: The figure depicts why Rw constitutes a rectangle

to xyw and let pI be the corresponding path going thru state q. Note that in pI the x-variables are all read prior toreaching q, and the y-variables are read after reaching q, and there is some split of the w variables into w1, w2 wherethe w1 variables are read prior to q and the w2 variables are read after q. Similarly, let I ′ be the input correspondingto x′y′w and let pI′ be the corresponding path. There is now a possibly different split of w into w′1, w′2, so x′, w′1 areread before q and y′, w′2 are read after q. We claim that xy′w ∈ Rw: consider the path (x,w1) (the first half of pI ) and(y′, w′2) (the second half of pI′ ). This path must be consistent since w1 and w′2 are consistent and x, y′ are on disjointvariables. Thus there is an input consistent with this path; it is an accepting path going through q and consistent withw; the variables in πred are all read before q, and the variables in πwhite are all read after q. Thus it is in Rw. Ananalogous argument shows that x′yw ∈ Rw. Thus Rw is an embedded rectangle.

Secondly we will show (ii) for each Rw ⊆ S4, the size of the red leg is at least r. (That is, |A| ≥ r.) Consider anonempty rectangle Rw with red leg A, white leg B and spine w. Recall that the inputs in S3 consist of a partial inputw+ ∈ DN−πred together with a set A ⊆ Dπred such that |A| ≥ r. We obtain S4 from S3 by selecting mw coordinatesfrom N − πred, one at a time, choosing each coordinate greedily, where a coordinate is chosen if it is read after stateq in the most inputs. Consider a block of inputs (A,w+) ∈ S3. If some input (α,w+) ∈ (A,w+) survives, then allcoordinates in πwhite that were chosen must all be read after state q on input (α,w+). But this means that for everyinput (α′, w+) ∈ (A,w+), all coordinates in πwhite are also read after q. (Otherwise, some coordinate would be readtwice along this accepting input, violating the read-once condition.) Thus, either the entire block (A,w+) is in S4, orthe entire block is removed from S4.

Now let Rw = (πred, πwhite, A,B,w) ⊆ S4 be a nonempty rectangle, w ∈ DN−πred−πwhite . Rw is obtained bytaking the union of (nonempty) blocks (A′, w+) ∈ S4, w+ ∈ DN−πwhite . Since as we argued above, for each suchblock, |A′| ≥ r, it follows that |A| ≥ r as well.

Let ravg denote the average size of the white leg of the rectangle over all rectangles Rw ⊆ S4. It is easy to seethat |D|n−mwravg ≥ r|D|n−mr (1/2 − 2γ)mw . It follows that ravg ≥ r if |D|mw−mr (1/2 − 2γ)mw ≥ 1. The latterinequality follows from condition (2). Thus, by condition (2) assumed in the hypothesis of lemma 2.3.3, we can picksome setting w∗ to the remaining n−mr −mw uncolored coordinates (the coordinates that are not in πred or πwhite)

24

such that the white leg of the rectangle Rw∗ has size at least ravg = r. Let S∗ equal Rw∗ . By construction, both thered leg of S∗ = Rw∗ and the white leg of Rw∗ have size at least r. Prune S∗ so that each leg has size exactly r, thuscompleting the proof of the lemma.

Lemma 2.3.5. Let R be the set of all mr-by-mw embedded rectangles over DN such that |A| = |B| = r, where

mw = γn and mr = mw/4. Then |R| ≤ (e/γ)54mw |D| 54 rmw+mw/γ .

Proof. The number of choices for πred, the coordinates of A, is(nmr

). Given πred, we choose r vectors from the

|D|mr possible values for the elements of A. Thus the total number of possible sets A is at most(nmr

)|D|rmr . Simi-

larly the number of choices for the set B is at most(nmw

)|D|rmw . The number of choices for w ∈ DN−πred−πwhite is

|D|n−mr−mw . Thus |R| is at most(nmr

)(nmw

)|D|rmr |D|rmw |D|n− 5

4mw . Using the inequality(nk

)≤ ( enk )k we con-

clude the number of choices for |R| is at most (en/mw)mw(4en/mw)14mw |D|n− 5

4mw |D| 54 rmw ≤ (e/γ)54mw |D| 54 rmw+mw/γ

Lemma 2.3.6. Define the predicate Good(R, u) to be true if for every input x in the rectangle R, the polynomial u on

input x is less than K1−δ (i.e. Polyu(x) is true). Then for all embedded rectangles R of size d, Pru[Good(R, u)] ≤ pwhere p = |D|−δnd.

Proof. Assume Good(R, u). Suppose that |R| = d and let B′ ∈ [K1−δ]d specify a vector of d accepting values.Let GoodB′(R, u) to be the event that for all x ∈ R, Polyu(x) = B′(x). Then Pru[Good(R, u)] = K(1−δ)d ·Pru[GoodB′(R, u)].

To bound Pru[GoodB′(R, u)], suppose that it is true that ∀x ∈ R, Fu(x) =∑i<d uix

i = B′(x). Note thatthis fixes the output of the degree d − 1 polynomial for d values of x. By interpolation, this uniquely determinesthe polynomial, u′. Thus, Pru[GoodB′(R, u)] = Pru[u = u′] = K−d = |D|−nd. Overall, Pru[Good(R, u)] ≤K(1−δ)d|D|−nd = |D|n(1−δ)d|D|−nd = |D|−δnd. This completes the proof of Lemma 2.3.6.

Lemma 2.3.7. For a random u,for fixed parameters d, δ the probability thatPolyu(x) does not accept a (1±o(1))K−δ

fraction of all the inputs is at most o(1). (Here both o(1) are K−(1−δ)/3.)

Proof. Randomly choose the coefficients u ∈ [K]d of the d−1 degree polynomial. For each instance x ∈ [K] (andvalue b ∈ [K]), let A〈x,b〉 denote the event that the output of this polynomial on input x is b. Let ax denote theevent that this value is less than K1−δ so that x is a yes instance. Let Y =

∑x∈K ax denote the number of yes

instances for the chosen u. Note p = Pru[ax] = K1−δ/K = K−δ because just choosing the constant coefficientu0 of the polynomial randomly makes the polynomial’s output on x uniformly random in [K]. Hence, by linearity ofexpectation Y = E[Y ] = K ·Pru[ax] = K1−δ . We show that the A〈x,b〉 events for different x are d-wise independentas follows. Consider any subset x1, x2, . . . , xd ⊂ [K] of the instances. Knowing the value of the polynomialat each of these instances, by interpolation, uniquely determines the coefficients u of the polynomial. Hence, ifall you know about u is the values on d−1 of these instances, then the value on the remaining is still uniformlyrandom within [K]. Formally stated, Pru[A〈xd,bd〉 | A〈x1,b1〉, . . . , A〈xd−1,bd−1〉] = Pru[A〈xd,bd〉]. Not fully knowingthe value of the first d−1 of the instances, but only that their value is small, give you even less information. Hence,Pru[axd | ax1

, . . . , axd−1 ] = Pru[axd ]. It follows that Pru[ax1∧ . . .∧axd ] = Pru[ax1

] · . . . ·Pru[axd ]. Because the axevents are d-wise independent, it follows that the dth order standard deviation of their sum Y is the same as it would beif they were completely independent events. We, however, only need to consider the variance. More formally, for eachx, let a′x be an independent event with probability K−δ of success and Y ′ =

∑x∈K a

′x. The variance is Var[Y ] =

25

Eu[(Y − Y )2] = Eu[(∑x ax − Y )2]. The non-linear part of this is Eu[(

∑x ax)2] =

∑x,x′ Eu[ax · ax′ ], which we

know from pair-wise independence is∑x,x′ Eu[ax] · Eu[ax′ ] =

∑x,x′ Eu[a′x] · Eu[a′x′ ]. The same computation for

the a′x, gives that σ2 = Var[Y ] = Var[Y ′] = K · p(1−p) ≈ KK−δ = K1−δ = Y . By Chebycheff’s inequality,∀η > 0 we have Pru(|Y − Y )| ≥ ησ) < 1

η2 . Setting η = Y16 , gives Pru(Y 6∈ (1± Y −

13 )Y ) ≤ Y −

13 .

We are now ready to complete the proof of the theorem. Call a polynomial u “good” if Polyu accepts at leasta 1

2K−δ fraction of all inputs. By Lemma 2.3.7, we know that at least half of all u’s are good. For each good u,

the corresponding sensitive function fu has density at least 12|D|K

−δ . Since fu is sensitive and has sufficient densityLemma 2.3.3 tells us that any small branching program for fu implies that there exists an mr-by-mw embeddedrectangle size r2 that is accepted (assuming that conditions (1), (2), and (3) are satisfied).

On the other hand, by union bound Lemmas 2.3.5 and 2.3.6 together tell us that at most a p|R| fraction ofdegree d− 1 polynomials u have such mr-by-mw embedded rectangles of size r2 that are accepted. Suppose we canchoose a setting of the parameters so that p|R| < 1/4. If follows that for at least 1

4

th of all good polynomials thecorresponding sensitive functions fu do not have such mr-by-mw embedded rectangles of size r2 that are acceptedsince the hyperplane constraint ha(u)(x) can only shrink an accepting rectangle. Then by lemma 2.3.3 this impliesthat at least as many fu cannot have small branching programs, and thus the theorem is proven.

It is left to show that we can set the parameters such that p|R| < 1/4, and properties (1), (2), and (3) of Lemma2.3.3 are satisfied. We will set the parameters as follows: |D| = 3, mw = 4mr = γn, γ = .01, δ = γ/300, r = 3000,and d = r2. To achieve p|R| < 1/4, we require |D|δmwr2/γ−mw/γ− 5

4 rmw > 4(e/γ)54mw . Using |D| = 3 and

factoring outmw, it is sufficient if we have 3δr2/γ−1/γ− 5

4 r > 4(e/γ)54 . With our choice of parameters, this is satisfied

for r ≥ 3000.

For Lemma 2.3.3, we also require assumptions (2) and (3). First for (2): |D|mr ≤ |D|mw(1/2 − 2γ)mw . For|D| = 3 and mw = 4mr, this is satisfied. For (3) we require: r ≤ 1

4|D| (1/2 − γ)mr |D|mr−δn = 14|D| (1/2 −

γ)mr |D|mr(1−4δ/γ). For |D| = 3, γ = .01, δ = γ/300, we have (1/2 − γ)|D|(1−4δ/γ) ≥ 1.44 and thus it sufficesto show r ≤ 1

12 (1.44)mr/s. This holds for our choice r = 3000 when s ≤ 2cmr = 2cn/(4γ) for some c > 0 andsufficiently large n. Note that |D| > 2 helps us in ensuring assumptions (2) and (3) hold.

2.4 Conclusion

We have proved an exponential lower bound on the size of non-deterministic semantic read once branching programscomputing a polynomial time computable function f : Dn → 0, 1 when D = 0, 1, 2 with just three elements.Our contribution is that we bring down the size of the domain required to achieve this. Prior to our result the best thatwas known was for D-ary branching programs with |D| ≥ 213. The polytime computable function f for which weshow the lower bound is the decision problem of determining whether a certain degree d polynomial over a finite fieldK evaluates to a value less than a certain threshold at a given input (along with a hyperplane constraint). This resultbrings down the focus to the first non-boolean case, |D| = 3 vs the boolean case, |D| = 2, since, interestingly the casewhere D is boolean 0, 1 still remains open and no non-trivial lower bounds are known for binary non-deterministicsemantic read once branching programs [39]. In the next section we explore the Booelan case.

26

2.5 Semantic Branching Programs with |D| = 2 can evade Large BottleneckRectangles

In this section we show how binary non-determinisitc semantic read once branching programs can behave differentlyby evading lower bounds in certain bottleneck layers by having a small number of states in those layers irrespectiveof what function f : 0, 1n → 0, 1 they are computing. The example upper bounds we give demonstrate why it islikely harder to prove lower bounds for semantic read once branching programs with domain size |D| = 2.

When the domain size is |D| > 2, the technique is to prove that the set of yes instances handled by any one state ofthe branching program contains a rectangle and then identify a computational problem that has no large rectangle ofyes instances. Hence, the rectangle for each state must be small. Because there are exponentially many yes instancesand each must be handled by at least one state at a selected bottle neck level of the branching program, there must bean exponential number of states at that level. We show here that for domain size |D| = 2, the set of yes instanceshandled by one state can be quite arbitrary and quite large. This does not mean that the total number of branchingprogram states can necessarily be small. But it does mean that at the one level of the branching program that the proveris hoping to use for a bottleneck, the number of states might be quite small.

A lower bound that attempts to prove that a selected bottleneck level of the branching program must have manystates, must start by selecting which level of the branching program will be the level in question. It might do this byspecifying how many or which variables have been read so far. Given any boolean computational problem with inputfrom 0, 1n and a criteria for choosing the bottle neck level chosen from a wide (but not exhaustive) range of possiblechoices, we now show how to fool such a lower bound, by giving a branching program that gives a polynomial upperbound on the number states at the selected layer.

The branching program is constructed as follows. For each yes instance A ∈ 0, 1n, we form an acceptingbranching program path 〈C1(A), q(A), C2(A)〉 where q(A) denotes the state A passes through at the bottleneck level,C1(A) the path before this level and C2(A) that after. Note that to get a counter example to a lower bound techniqueusing some bottleneck layer, we don’t need to give a full poly-size branching program. We only need the number ofstates q(A) to be small. It can have exponential number of states before and after this layer. Hence, we will haveall of these paths C1(A) for different yes instances A be completely disjoint from each other. Similarly for C2(A).These paths only come together and interact at the special layer of states q(A). In order to make the properties of thislevel more arbitrary, let A1, A2 ⊂ [n] be any partition of the input variables into two parts. Let C1(A) read all theones in A1 and all the zeros in A2. Let C2(A) read all the zeros in A1 and all the ones in A2. Let q(A) = 〈u, v〉be the state, where u ∈ [n] is the number of ones in A1 and v ∈ [n] is the number of the zeros in A2. Hence onlyn2 states are needed in the layer. Note that because we have allowed the computational problem to be arbitrary, otherthan partitioning its yes instances based on their hamming weights, the sets of instances handled by a state q(A) iscompletely arbitrary.

Note that as long as A1, A2 are comparable in size, for most of the inputs, the incoming path C1(A) and C2(A)

are of comparable length. However, the purported bottleneck layer for which we give the above upperbound is notidentical to the one we use for our lower bound for |D| ≥ 3 in the sense that the bottleneck states like q(A) do notappear exactly midway through the accepting paths at length n/2 on all the paths as is required in Lemma 2.3.3.Nevertheless, the upper bound is interesting because for most inputs A the incoming and outgoing paths through astate in the layer are of comparable length.

We will now in two ways, prove that this branching program solves the given computational problem. We will

27

start with a communication game interpretation. Think about the algorithm as a game between two players C1 andC2 and Charlie who they don’t trust. Charlie shows C1 the ones in the first part A1 of the input and the zeros in thesecond part A2. Assuming he trusts Charlie, this lets C1 know the entire input. Hence, he can answer any questionabout the input. The only way that Charlie can cheat is to not show all of the entries. In order to verify that he is notlying, C1 sends to C2 the number of ones in A1 and the number of zeros in A2. C2 can then check that they have bothbeen shown all of what they were supposed to see.

Now lets consider the branching program interpretation. Clearly, the branching program described accepts all yesinstances of the given problem, because it has a separate accepting path 〈C1(A), q(A), C2(A)〉 for each yes instanceA. What remains is to prove 1) the branching program is semantic read once and 2) that no no instances are accepted.We do this by showing for every pair 〈A,B〉 of different yes instances that pass through the same bottleneck statesq(A) = q(B) = 〈u, v〉 = q, that the cross path 〈C1(A), q, C2(B)〉 is inconsistent in that it reads some variable twicewith different values. Hence, by the definition of semantic, it does not matter that this path is not read once and becauseit is inconsistent, it cannot be accepting a no instance. Because A and B are different, either A1 and B1 are differentand/or A2 and B2 are different. Assume A1 and B1 are different. Because A1 and B1 have the same number u ofones, there is an element that is one in A1 and zero in B1. Hence C1(A) and C2(B) both read it. This element is readtwice in 〈C1(A), q, C2(B)〉 with different values and so is inconsistent.

So for D = 2, presence of a small number of states in a supposed bottleneck layer of a branching program neednot imply that there exists a balanced embedded rectangle of accepting instances. In particular, our lower bound findsa rectangle within the set of yes instances handled by narrowing the set down to a subset within which for manyvariables it is fixed whether it is read before or after the state. However in this upper bound, whether a variable is readbefore or after the state q(A) is completely determined by whether its value is 0 or 1. Hence, fixing this fixes its value.If this is done for every variable, the set of inputs left in the eventual rectangle identified by this lower bound methodis narrowed down to a singleton.

2.5.1 Candidate Problems for a Boolean function lower bound

Conjecture 29. Polytime computable functions with exponential lower bounds for weakly readonce BPs are not yetknown. We propose the following boolean functions as promising candidates for which such a bound might hold.

• st-Non Connectivity: Does a given directed graph have no s to t path ? We would also benefit if our guess iswrong. If by some means one can come up with a poly-sized read-once branching program for checking st-nonconnectivity it would give us a really surprising read once proof of the celebrated Immerman-Szelepcsenyi resultthat NL = coNL [35, 58].

• Does verifying if a given input is a latin square require exponential size weakly read once BPs ? One way ofpresenting the input is as an n×n×n cube with 0, 1 entries, and the branching program has to identify if thegiven cube has exactly one 1 in each row, column and leg.

28

Chapter 3

Hardness of Function Composition forSemantic Read once Branching Programs

29

3.1 Introduction

One of the most promising approaches to proving major separations in complexity theory is to understand the com-plexity of function composition. Given two Boolean functions, f : 0, 1m → 0, 1 and g : 0, 1n → 0, 1, theircomposition is the function f g : 0, 1mn → 0, 1 defined by

(f g)(x1, . . . , xm) = f(g(x1), . . . , g(xm)).

The complexity of function composition is one of the most tantalizing and basic problems in complexity theory, andhas been studied in a variety of models. There is essentially no general setting where function composition can becomputed with substantially less resources than first computing each instance of g, followed by computing f on theoutputs of the g’s. Indeed, lower bounds for function composition are known to resolve several longstanding openproblems in complexity theory.

The most famous conjecture about function composition in complexity theory is the Karchmer-Raz-Wigderson(KRW) conjecture [42], which asserts that the minimum depth of a circuit (of fan-in 2) that computes f g (over AND,OR and NOT gates) for non-constant functions f and g is the minimum depth of computing f plus the minimum depthof computing g. That is,

D(f g) ≈ D(f) +D(g).

Karchmer, Raz and Wigderson show that repeated applications of this conjecture implies super-logarithmic lowerbounds on the depth complexity of an explicit function, thus resolving a major open problem in complexity theory(separating P from NC1). In particular, The tree evaluation problem defines iterated function composition withparameters d and h as follows. The input is an ordered d-ary tree of depth h+ 1. Each of the dh leaf nodes of the treeis labelled with an input bit, and each non-leaf node of the tree is labelled by a 2d Boolean vector, which is the truthtable of a Boolean function from 0, 1d → 0, 1. This induces a 0/1 value for each intermediate node in the tree inthe natural way: for a node v with corresponding function fv , we label v with fv applied to bits that label the childrenof v. The output is the value of the root node. The basic idea is to apply h = O(log n/ log log n) compositions of arandom d = log n-ary function f : 0, 1logn → 0, 1 to obtain a new function over O(n2) bits that is computablein polynomial time but that requires depth Ω(log2 n) (ignoring lower order terms).

In communication complexity, lower bounds for function composition have been extremely successful for solvingmany open problems. Lifting Theorems in communication complexity show how to reduce lower bounds in querycomplexity to lower bounds in communication complexity via function composition. Raz and McKenzie’s lowerbound for function composition in the monotone setting [55] (in hindsight) is now viewed as a general lifting theoremfor deterministic communication complexity; subsequent lifting theorems have been proven for many other models ofcomputation [16,32–34,64]. These theorems have had simplified and unified many results, and in addition have led tothe resolution of important open problems in areas such as game theory, proof complexity, extension complexity, andcommunication complexity [15, 22, 31, 44, 47].

The complexity of function composition for space-bounded computation has also been studied since the 1960’s.The classical result of Neciporuk [51] proves Ω(n2/ log2 n) size lower bounds for deterministic branching pro-grams for function composition1. Subsequently, Pudlak observed that Neciporuk’s method can be extended to prove

1While Neciporuk’s result is not usually stated this way, it can be seen as a lower bound for function composition. We present this alternativeproof in section 3.8.1 in the last part.

30

Ω(n3/2/ log n) size lower bounds for nondeterministic branching programs. These classical results are still thebest unrestricted branching program size lower bounds known, and it is a longstanding open problem to break thisbarrier. Furthermore, its known that Neciporuk method cannot fetch lower bounds better than those mentionedabove [10, 11, 39].

In this work, we study time/space tradeoffs for function composition. We prove asymptotically optimal lowerbounds for function composition in the setting of nondeterministic read once branching programs, for the syntacticmodel as well as the stronger semantic model of read-once nondeterministic computation. We prove that such branch-ing programs for solving the tree evaluation problem over an alphabet of size k requires size roughly kΩ(h), i.e spaceΩ(h log k). Our lower bound nearly matches the natural upper bound which follows the best strategy for black-whitepebbling [21] the underlying tree. While previous super-polynomial lower bounds have been proven for read-oncenondeterministic branching programs (for both the syntactic as well as the semantic models), we give the first lowerbounds for iterated function composition, and in these models our lower bounds are near optimal.

3.1.1 History and Related Work

Function Composition and Direct Sum Conjectures

Raz and McKenzie proved the KRW conjecture in the context of monotone circuit depth [55]. In an attempt to provethe KRW conjecture in the non-monotone case, Karchmer, Raz and Wigderson proposed an intermediate conjec-ture, known as the universal relation composition conjecture. This intermediate conjecture was proven by Edmondset.al [25] using novel information-theoretic techniques. More recently some important steps have been taken towardsreplacing the universal relation by a function using information complexity [30] and communication complexity tech-niques [24]. Dinur and Meir [24] give a ”composition theorem” for f g where g is the parity function and analternative proof of cubic formula size lower bound follows as a corollary of this result. The cubic formula size lowerbound was originally proven by Hastad [57] and more recently by Tal [60].

Time-Space Tradeoffs

In the uniform setting, time-space tradeoffs for SAT were achieved in a series of papers [26–28, 48]. Fortnow-Lipton-Viglas-Van Melkebeek [28] shows that any algorithm for SAT running in space no(1) requires time at least Ω(nφ−ε)

where φ is the golden ratio ((√

5 + 1)/2) and ε > 0. Subsequent works [23, 63] improved the time lower bound togreater than n1.759.

The state of the art time/space tradeoffs for branching programs were proven in the remarkable papers by Ajtai [1]and Beame-et-al [9]. In the first paper, Ajtai exhibited a polynomial-time computable Boolean function such thatany sub-exponential size deterministic branching program requires superlinear length. This result was significantlyimproved and extended by Beame-et-al who showed that any sub-exponential size randomized branching programrequires length Ω(n logn

log logn ).

Lower bounds for nondeterministic branching programs have been more difficult to obtain.

Length-restricted nondeterministic branching programs come in two flavors: syntactic and semantic. A length lsyntactic model requires that every path in the branching program has length at most l, and similarly a read-c syntacticmodel requires that every path in the branching program reads every variable at most c times. In the less restrictedsemantic model, the read-c requirement is only for consistent accepting paths from the source to the 1-node; that is,

31

accepting paths along which no two tests xi = d1 and xi = d2, d1 6= d2 are made. Thus for a nondeterministic read-csemantic branching program, the overall length of the program can be unbounded.

Note that any syntactic read-once branching program is also a semantic read-once branching program, but theopposite direction does not hold. In fact, Jukna [37] proved that semantic read-once branching programs are exponen-tially more powerful than syntactic read-once branching programs, via the “Exact Perfect Matching”(EPM) problem.The input is a (Boolean) matrix A, and A is accepted if and only if every row and column of A has exactly one 1 andrest of the entries are 0’s i.e if it’s a permutation matrix. Jukna gave a polynomial-size semantic read-once branchingprogram for EPM, while it was known that syntactic read-once branching programs require exponential size [41, 45].

Lower bounds for syntactic read-c (nondeterministic) branching programs have been known for some time [13,52].However, for semantic nondeterministic branching programs, even for read-once, no lower bounds are known forpolynomial time computable functions for the boolean, k = 2 case. Nevertheless exponential lower bounds forsemantic read-c (nondeterministic) k-way branching programs, where k ≥ 23c+10 were shown by Jukna [36]. Morerecently [18] obtain exponential size lower bounds for semantic read-once nondeterministic branching programs fork = 3, leaving only the boolean case open. Liu [49] proved near optimal size lower bounds for deterministic read oncebranching programs for function composition.

The rest of the paper is organized as follows. In Section 3.2 we give the formal definitions, present the naturalupper bound and state our main result. In Section 3.3 we give the intuition and proof outline. Sections 3.4,3.5 and 3.6are devoted to individual parts of the proof.

3.2 Definitions and Statement of Results

Definition 30. Let f : [k]n → 0, 1 be a boolean valued function whose input variables are x1, . . . , xn wherexi ∈ [k]. A k-way nondeterministic branching program for f is an acyclic directed graph G with a distinguishedsource node qstart and sink node (the accept node) qaccept. We refer to the nodes as states. Each non-sink state islabeled with some input variable xi, and each edge directed out of a state is labelled with a value b ∈ [k] for xi. Foreach input ~ξ ∈ [k]n, the branching program accepts ~ξ if and only if there exists at least one path starting at qstartleading to the accepting state qaccept, and such that all labels along this path are consistent with ~ξ. The size of abranching program is the number of states in the graph. A nondeterministic branching program is semantic read-onceif for every path from qstart to qaccept that is consistent with some input, each variable occurs at most once along thepath.

Syntactic read-once branching programs are a more restricted model where no path can read a variable more thanonce; in the semantic read-once case, variables may be read more than once, but each accepting path may only queryeach variable once.

Definition 31. The (ternary) height h tree evaluation problem Tree~F , has an underlying 3-ary tree of height h withn = 3h−1 leaves. Each leaf is labelled by a corresponding variable in x1, . . . , xn. (Note that a tree with a single nodehas height 1.) Each internal node v is labeled with a function F : [k]3 → [k], where ~F denotes the vector of thesefunctions. The input ~ξ ∈ [k]n gives a value in [k] to the leaf variables ~x. This induces a value for each internal node inthe natural way, and the output Tree~F (~ξ) is the labeling of the root. In the boolean version, the input ~ξ is accepted ifand only if Tree~F (~ξ) ∈ [k1−ε] where ε ∈ (0, 1) is a parameter.

The most natural way to solve the tree evaluation problem is to evaluate the vertices of the tree, via a strategy that

32

mimics the optimal black-white pebbling of the underlying tree. In the next section, we review this upper bound, andshow that it corresponds to a nondeterministic semantic read-once branching program of size Θ(kh+1). Our mainresult gives a nearly matching lower bound (when k is sufficiently large compared to h).

Theorem 32. For any h, and k sufficiently large (k > 242h) , there exists ε and ~F such that any k-ary nondeterministic

semantic read-once branching program for Tree~F requires size Ω(

klog k

)hWe prove the lower bound for the decision version of the tree evaluation problem, with ε chosen to be 9h

log k .Secondly, we actually show(See 3.8.2 in the last section) that the lower bound holds for almost all ~F , whenever eachF is independently chosen to be a random 4-invertible function:

Definition 33. A function F : [k]3 → [k] is 4-invertible if whenever the output value and two of its inputs froma, b, c are known, then the third input can be determined up to a set of four values. That is, for each pair of values(a, b) ∈ [k]2, the mapping F (a, b, ∗) : [k]→ [k] is at most 4-to-1, and likewise for pairs (b, c) and (a, c).

We expect that the lower bound should still hold even if every function in ~F is fixed to be a particular functionwith nice properties, although we are not able to prove this at present. In particular, we conjecture that the lowerbound still holds where for every v, Fv(a, b, c) = a3 + b3 + c3 over the field [k]. On the other hand, if we take anassociative function such as Fv(a, b, c) = a3 · b3 · c3 again over the field [k], then there is a very small branchingprogram, since we can compute the root value by reading the elements one at a time and remembering the product sofar. One thing that makes proving the lower bound difficult is not being able to properly isolate or take advantage ofthe differences between these functions over a finite field. For the rest of the paper, we will refer to nondeterministicsemantic read-once branching programs as simply branching programs.

3.2.1 Black/White pebbling, A natural upper bound

In order to get some intuition, we first review the matching upper bound. As mentioned earlier, the upper boundmimics the optimal black/white pebbling strategy for a tree [20]. A black pebble placement on a node v correspondsto remembering the value in [k] labelling that node, and a white pebble on v corresponds to nondeterministicallyguessing v’s value (which must later be verified.) The goal is to start with no pebbles on the tree, and end up with oneblack pebble on the root (and no other pebbles). The legal moves in a black/white pebbling game are:

1. A black pebble can be placed at any leaf.

2. If all children of node v are pebbled (black or white), place a black pebble at v and remove any black pebbles atthe children. (When all children are pebbled, a black pebble on a child of v can be slid to v.)

3. Remove a black pebble at any time.

4. A white pebble can be placed at any node at any time.

5. A white pebble can be removed from v if v is a leaf or if all of v’s children are pebbled. (When all children butone are pebbled, the white pebble on v can be slid to the unpebbled child.)

Lemma 34. Black pebbling the root of a d-ary tree of height h can be done with (d − 1)(h − 1) + 1 pebbles. Withboth black and white pebbles, only d 1

2 (d− 1)h+ 1e pebbles are needed.

33

1

2 3 4 5 6

Figure 3.1: This figure describes a black/white pebbling for a d-ary tree T of height h at d=5. We start by pebbling theheight h-1 subtrees rooted at nodes 2,3 and 4. Then we proceed to the second half of children and guess the value thatsubtrees at node 5 and 6 would evaluate to. Now we can pebble the root node 1 and remove the black pebbles. Thewhite pebble or guess at node 5 can now be verified and then the same is done subsequently for node 6.

Proof. We will assume that d is odd; the case of d even is similar. With only black pebbles, recursively pebble d− 1

of the d children of the root. Then use d − 1 pebbles to remember these values as you use (d − 1)(h − 2) + 1 morepebbles to pebble its dth child for a total of (d− 1) + (d− 1)(h− 2) + 1 = (d− 1)(h− 1) + 1 pebbles. Then pebblethe root.

Now suppose white pebbles are also allowed (see Figure 3.1). Recursively pebble 12 (d − 1) + 1 of the d children

of the root. Then use 12 (d− 1) pebbles to remember these values as you use d 1

2 (d− 1)(h− 1) + 1e more pebbles topebble its next child for a total of 1

2 (d− 1) + 12 (d− 1)(h− 1) + 1 = 1

2 (d− 1)h+ 1 pebbles. Then use white pebblesto pebble the remaining 1

2 (d− 1) children of the root. Pebble the root and pick up the black pebbles from the children.Replacing the first of these whites requires 1

2 (d − 1)(h − 1) + 1 in addition to the 12 (d − 1) white ones, again for a

total of 12 (d− 1)h+ 1. Note as a base case, when h = 2 and there is a root with d children, d pebbles are needed, no

matter what the color.

Lemma 35. A pebbling procedure with p black or white pebbles (and t time) translates to a layered nondeterministicbranching program with tkp states. If only black pebbles are used, the branching program is deterministic.

Proof. On input ~ξ the branching program moves through a sequence of states β1, β2, .., βt where the state βt′ corre-sponds to the pebbling configuration at time t′. Each layer of the branching program will have kp states one for eachpossible assignment of values in [k] to each of the pebbles. If a black pebble is placed on a leaf during the pebblingprocedure, then the branching program queries this leaf. If all of the children of node v are pebbled, then the branchingprogram knows their values v1, v2 and v3 and hence can compute the value fv(v1, v2, v3) of the node. Rememberingthis new value corresponds to placing a black pebble at v. Removing a black pebble corresponds to the branching pro-gram forgetting this computed value. If a white pebble is placed at v, then the branching program nondeterministicallyguesses the required value for this node. This white pebble cannot be removed until this value has been verified to befv(v1, v2, v3) using the values of its children that were either computed (black pebble) or also guessed (white).

Observe that when we transform the black/white pebbling algorithm in lemma 34 using the translation procedurepresented in lemma 35 we obtain a syntactic read once branching program since along any computation path the

34

branching program makes a query corresponding to a node when it places a black pebble on the node or when itremoves a white pebble on verifying the guess, and only of these happens and that too exactly once at any node duringthe course of the pebbling scheme.

3.3 Proof Overview

A warm-up overview of our argument is as follows. We prove that for every branching program of small size thatsolves the Tree~F problem, there is a “label” L that one can extract which helps us describe the internal functionsof the underlying tree ~F . Given all but one internal function at some special node in the tree, the knowledge of thislabel helps us encode the internal function at the special node in the tree in an unduly efficient way thus providing ashorter encoding than what is possible for ~F . The label L itself is obtained using a small branching program solvingthe Tree~F problem as follows.

Since each accepted input has a read once computation path associated with it there is some permutation of theleaves of the tree according to which they appear along this path. What we first find is a special leaf query alongthis accepting computation path such that the path P down the ternary tree2 determined by this leaf is such that thesubtrees appearing along P are guaranteed to have some special property. Without loss of generality, it helps to thinkof the subtrees appearing along P as appearing to the left and right while continuing downward toward the leaf alongthe ternary tree. The subtrees hanging off the left of the path are considered “red” and those hanging of the rightare considered “white” since we find the special leaf determining this path down the tree such that it guarantees thefollowing: a reasonable number of the leaves in the red subtrees are read before state q querying our special leaf andsimilarly a reasonable number of the leaves in the white subtrees are read after the state q.

Intuitively, if the number of such special states q nominated by accepting inputs is small we can construct our labelL that identifies a special internal node in the tree. The reason we can obtain such a label allowing for compressionin an encoding describing some internal function Fv∗ is because the branching program is read-once, the only way totransmit the information about the values of the red variables is via which state q we are passing through. Similarly theonly way to nondeterministically guess information about the values of the white variables is also via this same stateq. Given there are only s << kh states at the interface, focusing on one particularly popular special node q amongstaccepting inputs allows us to show that there is at least one node v∗ along the tree path P that has a red variable x anda white variable y that can each take on about r values.

We would like the internal functions ~F of the ternary tree to be invertible so that this number r of distinct valuesproduce as many distinct values as they travel up the tree. Since Tree~F is a decision problem, the input ~ξ is acceptedif and only if the value of the root is in the restricted set [k1−ε]. Again if the internal functions are invertible, the sizeof this set would be retained as it travels backwards down the tree. It follows that the function Fv∗ where these valuesmeet has an r-by-r square of inputs whose output ranges have size only k1−ε. Hence, when compressing ~F , this partof Fv∗ can be communicated with only r2 log(k1−ε) bits instead of the usual r2 log(k) bits. Roughly speaking, thissquare is what constitutes the label L described above. What is key about a square is that though its area consists ofr2 values, the length of its two sides is only r << r2. Hence, L can be described by only O(r log k) bits. This is whatgives us compression in communicating Fv∗ .

2we caution the discerning reader that the immediate two uses of the word ’path’ refer to different things

35

Now that we know the high-level structure of our argument we delve deeper to give a slightly more detailedoverview including the description of what actually constitutes a label. As we just saw, the crux of the argumentis a compression argument, showing that from a small branching program for Tree~F , we can encode ~F with lessbits than what is required information theoretically. We first review the simpler argument where ~F are all invertiblefunctions. (For every F , the value of two of the inputs and the output value completely determines the value of thethird input.) The main argument (proven in Section 5) shows that given such a branching program, we can find a label

L =< P, vi∗, ~w, xred,i∗, xwhite,i∗, Xred, Xwhite, Sred, Swhite > such that the following properties are satisfied: 3

(1) P = vh, vh−1, . . . , v1 is a path in the ternary tree (defining the tree evaluation problem), where vh is the rootof the ternary tree, v1 is a leaf vertex. vi∗ is a special vertex along this path, where xred,i∗ ∈ Xred andxwhite,i∗ ∈ Xwhite are leaf variables in the subtree rooted at vi∗.

(2) Xred, Xwhite ⊆ x1, . . . , xn are (small) disjoint sets of leaf variables.

(3) ~w is an assignment to all of the leaf variables other than those in Xred ∪Xwhite

(4) Sred ⊂ [k]|Xred| consists of a set of r partial assignments to the variables of Xred such that the projectionof these assignments onto the special variable xred,i∗ gives r distinct assignments. and similarly Swhite ⊂[k]|Xwhite| consists of a set of r partial assignments to the variables of Xwhite such that the projection of theseassignments onto the special variable xwhite,i∗ gives r distinct assignments.

(5) The embedded rectangle of r2 total truth assignments R = (~w, ρ, σ), ρ ∈ Sred, σ ∈ Swhite are all acceptingtruth assignments.

Given such a label, we prove two things. First, we argue that we can encode the description of a function at someparticular internal vertex in the tree very efficiently. Let Fi∗ be the function associated with the special vertex vi∗ alongthe path P . We show that knowing the label, plus knowing the value of all functions in ~F other than Fi∗, reveals a lot(roughly O(r2 log k) bits) of information about Fi∗, the missing function. The idea is roughly as follows. The r2 totalassignments in the rectangle R described in (5) gives rise to r2 associated inputs (ai, bj , ci,j) for Fi∗ where i, j ∈ [r].These inputs are obtained by starting with an assignment (~w, ρ, σ) ∈ R to the leaf variables x1, . . . , xn and usingthe knowledge of ~F , evaluating the assignment bottom-up until we obtain an assignment to the three children of vi∗.On the other hand, each of the r2 assignments in R also gives constraints on the allowable outputs of Fi∗ on the inputs(ai, bj , ci,j). The constraints are obtained because we know that each total assignment in R is an accepting input, andthus the value of the root vertex vh on any of these inputs must lie in [k1−ε]. Since the functions in ~F are all invertible,this propagates down the path P , so that for each of the r2 inputs (ai, bj , ci,j) to Fi∗, we have a small set C(i, j) ⊆ [k]

of k1−ε values such that Fi∗(ai, bj , ci,j) must lie in C(i, j). Thus, overall, the label plus the information about allfunctions in ~F other than Fi∗ reveals (log k − log k1−ε)r2 = O(r2 log k) bits of information about Fi∗.

Secondly, we show that the description length of the above label is less than what it should be – namely, it is muchless than the number of bits that it reveals about Fi∗, giving a contradiction. The savings is due to the fact that R isan embedded rectangle. Fixing ~w, R is a product of two sets each of size r. Therefore knowing ~w, these inputs canbe described with an additional 2r log k bits rather than r2 log k bits. (And the other information in the label does notoverwhelm this savings that we achieved.)

3We have chosen to use modified descriptions in this overview. We caution the reader that they do not exactly match their names in the proof.

36

Now we fill in a few more details about how we obtain a label L from a small branching program starting fromwhere we left off in the warm-up overview. Recall, we obtained a popular state q through which a large set of acceptinginputs pass and their computation paths have the property that a reasonable number of the leaves in the red subtreesare read before state q, and a reasonable number of the leaves in the white subtrees are read after state q. From thisproperty, we can obtain an embedded rectangle which is a large set of accepting inputs where the assignment to allvariables outside of Xred and Xwhite is fixed to ~w, and the assignments over the rest of the variables form a largeproduct set Sred × Swhite over Xred times Xwhite.

We then clean up the embedded rectangle so that it is square, and so that it is associated with one specific functionFi∗ that labels a particular vertex vi∗ on the path P . More specifically, we show that there is one vertex vi∗ along thetree path P that has a red variable xred,i∗ ∈ Xred and a white variable xwhite,i∗ ∈ Xwhite such that as we run overthe r assignments in Sred, xred,i∗ takes on r distinct values, and similarly for xwhite,i∗. (See Figure 3.2).

Some complications arise when trying to carry out the above proof outline, making the actual proof more intricate.First, the compression argument requires that each ~F has a lot of accepting instances, so we need to show that mostrandom ~F have this property. The more serious complication is the fact that we cannot easily count over randominvertible functions, so instead we use functions that are almost invertible. More specifically ~F is a vector of 4-invertible functions which means that for each F ∈ ~F , knowing two of the inputs to F and the output value, there areat most four consistent values for the third input. We use a novel argument that allows us to count over 4-invertiblefunctions (Section 6). Our compression argument sketched above is then adapted to handle the case of 4-invertiblefunctions with a small quantitative loss. Namely when going down the path P to determine the constraints on theoutput of Fi∗ on an input (ai, bj , ci,j) ∈ R, the number of allowable values for Fi∗(ai, bj , ci,j) will be k1−ε at the rootvertex, and by 4-invertibility, we will gain a factor of four for each subsequent function along the path. Since the pathheight is very small relative to r this will still give us adequate compression.

3.4 Most ~F have a lot of accepting instances

Let Syes = ~ξ | Tree~F (~ξ) ∈ [k1−ε]. That is, Syes is the set of accepting inputs to Tree~F . Let Bad(~F ) be the eventthat the size of Syes is significantly smaller than expected – in particular |SY es| ≤ 1

6kε · kn. Let F be the uniform

distribution over 4-invertible functions, and let ~F be the uniform distribution over vectors of 4-invertible functions(one for each non-leaf vertex in the tree). Lemma 3.4.1 proves that Pr~F [Bad(~F )] is exponentially small, where ~F issampled from ~F .

Lemma 3.4.1. For k > 242h and ε = 9hlog k , Pr~F [Bad(~F )] ≤ 1

10 .

See section 3.8 in the last part for the proof. The above probability is in fact much smaller but the above boundsuffices for our purpose.

3.5 Finding an Embedded Rectangle

This section proves that the accepted instances of Tree~F solvable by a small branching program contain a largeembedded rectangle whenever Bad(~F ) does not occur.

37

Parameters. The number of variables is n = 3h−1 and each variable is from [k]. In what follows we will fix

r = 26h

ε and ε = 9hlog k . The lower bound will hold for s ≤

(k

n26 log k

)h. For k sufficiently large (k > 242h), the lower

bound is Ω(k/ log k)h.

Definition 3.5.1. For π ⊂ 1, . . . , n, let xπ denote the set of variables xi | i ∈ π. An embedded rectangle [8, 36]is defined by a 5-tuple (πred, πwhite, A,B, ~w), where:

(i) πred, πwhite are disjoint subsets of 1, . . . , n,

(ii) A ⊆ [k]|πred| is a set of assignments to xπred and B ⊆ [k]|πwhite| is a set of assignments to xπwhite ;

(iii) ~w ∈ [k]n−|πred|−|πwhite| is a fixed assignment to the remaining variables.

The assignments defined by the rectangle are all assignments (~α, ~β, ~w) where xπred = ~α, xπwhite = ~β and the rest ofthe variables are assigned ~w, where ~α ∈ A and ~β ∈ B.

3.5.1 Finding a rectangle over the leaves

In this section, we prove the following lemma, that shows the existence of a large embedded rectangle of acceptinginstances if the branching program solving Tree~F is small.

Lemma 3.5.2. Let B be a size s nondeterministic, semantic read-once BP over x1, . . . , xn solving Tree~F for some~F such that¬Bad(~F ) holds. Let s, r be chosen as above. Then there exists an embedded rectangle (πred, πwhite, A,B, ~w)

such that:

1. |πred| = |πwhite| = h,

2. |A| × |B| ≥ k2h−ε

s23h2 ,

3. B accepts all inputs in the embedded rectangle.

In order to prove the above Lemma, we will need the following definitions.

Definition 3.5.3. Let ~ξ be an accepting input, and let Comp~ξ be an accepting computation path for ~ξ. Since everyvariable is read exactly once, Comp~ξ defines a permutation Π of 1, . . . , n. If q is a state that Comp~ξ passes throughat time t ∈ [n], the pair (Π, q) partitions the variables x1, . . . , xn into two sets, Red(Π, q) = xi | Π(i) ≤ t andWhite(Π, q) = xj | Π(j) > t. Intuitively, since the branching program reads the variables in the order given byΠ (on input ~ξ), then Red(Π, q) are the variables that are read at or before reaching state q, and White(Π, q) are thevariables that are read after reaching state q.

Definition 3.5.4. A labelled path P down the ternary tree is a sequence of vertices vh, . . . , v1 that forms a path fromthe root to a leaf of the ternary input tree. For each vertex vj of height j along the path, its three subtrees are labelledas follows: one of its subtrees is labelled red and is referred to asRedtree(vj), another is labelled white and is referredto as Whitetree(vj) and lastly, Thirdtree(vj) refers to the subtree with root vj−1 that continues along the path P .The root of Redtree(vj) will be called redchild(vj), the root of Whitetree(vj) will be called whitechild(vj), andthe root of Thirdtree(vj) will be called thirdchild(vj).

38

Lemma 3.5.5. Let ~ξ be an accepting input with computation path Comp~ξ, where the ordering of variables read along

Comp~ξ is given by permutation Π of 1, . . . , n. Then there exists a state q and a labelled path P = vh, . . . , v1 in

the ternary tree such that for all vj in the path, 2 ≤ j ≤ h Redtree(vj) contains greater than 2j−2 variables in

Red(Π, q) and Whitetree(vj) contains greater than 2j−2 variables in White(Π, q).

Proof. We will prove the above lemma by (downwards) induction on the path length. At step j, 2 ≤ j ≤ h, wewill have constructed a labelled partial path vh, vh−1, . . . , vj , an interval [t0(j), t1(j)], and a partial coloring of thevariables such that the following properties hold:

1. All variables xi such that Π(xi) ≤ t0(j) will be Red and all variables xi such that Π(xi) ≥ t1(j) will be White.(The remaining variables that are read between time step t0(j) and t1(j) are still uncolored.)

2. For each vj′ , j ≤ j′ ≤ h, Redtree(vj′) contains greater than 2j′−2 red variables, and Whitetree(vj′) contains

greater than 2j′−2 white variables.

3. The subtree of vj that continues the path, Thirdtree(vj), has at most 2j−2 red variables and at most 2j−2 whitevariables.

Initially j = h, the path is empty, t0[h] = 1 and t1[h] = n. Thus the size of the interval is n = 3h−1 andsince no variables have been assigned to be red or white, the above properties trivially hold. For the inductive step,assume that we have constructed the partial path vh, . . . , vj+1. By the inductive hypothesis, the tree rooted at vj+1

contains at most 2j−1 red variables and at most 2j−1 white variables. Thus at most one subtree of vj+1 can containgreater than 2j−2 red variables. If one subtree of vj+1 does contain greater than 2j−2 red variables, then let this beRedtree(vj+1). Otherwise, increase t0[j + 1] until one of vj+1’s three subtrees contains (for the first time) more than2j−2 red variables and let this subtree be Redtree(vj+1). Since each of vj+1’s three subtrees has 3j−1 leaves and atmost 2j−1 white variables, there are at least 3j−1 − 2j−1 ≥ 2j−2 variables remaining in each subtree that are eitheruncolored or colored red, and thus the process is well-defined.

Next we work with the remaining two subtrees of vj+1 in order to define Whitetree(vj+1). Again by the in-ductive hypothesis, the tree rooted at vj+1 contains at most 2j−1 white variables, and thus as most one subtree of theremaining two can contain greater than 2j−2 white variables. If one is found, then designate it as Whitetree(vj+1),and otherwise, decrease t1[j + 1] until one of vj+1’s remaining two subtrees contains (for the first time) 2j−2 whitevariables and designate it as Whitetree(vj+1). Again since each subtree has 3j−1 leaves and at most 2j−1 red vari-ables, there are at least 3j−1 − 2j−1 ≥ 2j−2 variables remaining in each of the two subtrees that are uncolored orcolored white and thus the process is well-defined.

Let the remaining subtree of vj+1 be Thirdtree(vj+1) and let the next vertex vj in our path be thirdchild(vj+1).By construction Thirdtree(vj+1) contains at most 2j−2 red variables and at most this same number of white variables.

For the base case j = 2, by induction we will have reached a vertex v2 with 3 child vertices, where at most oneis colored red and at most one is colored white and thus the size of the interval [t0[2], t1[2]] is between one and three.Increase t0 and then decrease t1 so that v2 has exactly one red vertex and two white vertices and let q be the state thatComp~ξ passes through as it reads the last identified white child.

Proof. (of Lemma 3.5.2) Consider a nondeterministic semantic read-once branching program B for Tree~F . For eachaccepting input ~ξ, fix one accepting path Comp~ξ in the branching program. Each of the n variables must be read

39

in this path exactly once, and thus it defines a permutation Π~ξ of the n variables. Apply Lemma 3.5.5 for ~ξ (andcorresponding permutation Π~ξ) to obtain an associated labelled path Pξ and state qξ. Do this for all accepting inputs,and pick the pair P , q that occurs the most frequently. There are at most s possible values for q and at most 6h−1

possible labelled paths: n = 3h−1 ending leaves of the path and then for each of the h vertices vh′ along this path, wespecify which of its subtrees are Red and White, for another 2h−1 choices. Let S be those accepting inputs that giverise to the popular pair P, q. Since there are at least |SY es| > 1

6kε ·kn accepting inputs, S is of size at least

(1

6hskε

)kn.

Next we will select one common red variable in each of the h Red subtrees, and one common white variable in eachof the h White subtrees. Denoting the vertices of P by vh, vh−1, . . . , v1, we will select the Red and White variablesiteratively for j = h, h− 1, . . . , 1 as follows. Starting at Redtree(vj): for each ~ξ ∈ S, by Lemma 3.5.5 at least 2j−2

of its 3j−1 variables are red, and thus there is one variable that is red in at least a 2j−2

3j−1 fraction of S. Choose thisvariable, and update S to contain only those inputs in S where this variable is red. (That is, ~ξ ∈ S will stay in S if andonly if the variable is read by Comp~ξ before reaching state q.) Do the same thing for Whitetree(vj). At the end, wewill have selected for each j one variable that is red in Redtree(vj), and one variable that is white in Whitetree(vj),and a set of inputs S such that all h of the selected red variables (one per subtree) are read before reaching q and allh of the selected white variables are read after reaching q. Let πred be vector of h indices corresponding to theseh red variables, where πred,j is the index of the common red variable in Redtree(vj). and let πwhite be the vectorof h indices corresponding to these h white variables, where πwhite,j is the index of the common white variable inWhitetree(vj). The size of S after this process will be reduced by a factor of

Πj∈[2,...,h]

(2j−2

3j−1

)2

≥ 2−2h · 1.5−h2

.

Our final pruning of S is to fix a partial assignment, ~w, to the remaining n−2h variables that have not beenidentified as red or white. There are kn−2h choices here. Once again choose the most popular one. Overall, for h ≥ 2

this gives

|S| ≥ 1

kε6h22h1.5h2skn−2hkn ≥ k2h−ε

s1.5h2+8h≥ k2h−ε

s23h2 .

Let Sred ⊆ [k]πred be the projection of S onto the coordinates of πred, the red variables and let Swhite ∈ [k]πwhite

be the projection of S onto the coordinates of πwhite, the white variables. Let all the other variables be set according tothe vector ~w. It is clear that this gives an embedded rectangle, (πred, πwhite, Sred, Swhite, ~w). We want to show that allassignments in the rectangle are accepted by B. To see this, consider an assignment ~α~β ~w in the embedded rectangle,where ~α ∈ Sred is an assignment to xπred , and ~β ∈ Swhite is an assignment to xπwhite , and ~w is an assignment tothe remaining variables. By definition ~α is in the projection of S onto πred, and thus there must be an assignment~α~β′ ~w ∈ S. Similarly, there must be an assignment ~α′~β ~w ∈ S. Since these assignments are in S, the computationpaths on each of them goes through q, and all variables xπred are read before reaching q, and all variables xπwhite areread after q. We want to show that ~α~β ~w is also an accepting input (in S). To see this, we follow the first half of thecomputation path of ~α~β′ ~w until we reach q, and then we follow the second half of the computation path of ~α′~β ~w afterq. In this new spliced computation path, the variables xπred are all read (and have value ~α) prior to reaching q, and thevariables xπwhite are all read after reaching q (and have value ~β), and since all other variables have the same values onall paths, the new spliced computation path must be consistent and must be accepting. Therefore the input ~α~β ~w is inS and is an accepting input.

40

3.5.2 Refining the Rectangle

In this section, we refine the embedded rectangle given above, so that it will be a square r-by-r rectangle.

Definition 3.5.6. LetB be a branching program for Tree~F for some ~F such that¬Bad(~F ) holds , and let (πred, πwhite, Sred, Swhite, ~w)

be the embedded rectangle guaranteed by Lemma 3.5.2. We recall the notation/concepts from the proof of Lemma3.5.2:

1. Let P = vh, . . . , v1 be the common labelled path in the ternary tree, where Redtree(vi), Whitetree(vi)

denotes the Red and White subtrees of vi.

2. Let q be the common state in the branching program;

3. Let πred, πwhite be the indices of the red/white variables (h red variables altogether, one per Red subtree, and hwhite variables altogether, one per White subtree);

4. For all (accepting) inputs in the rectangle, all of the variables xπred are read before q, and all variables xπwhiteare read after q.

We will now define a special kind of embedded rectangle that isolates a particular vertex v along the path P (whichcorresponds to a particular function Fv).

Definition 3.5.7. Let P = vh, . . . , v1 be the labelled path in the ternary tree, and let r = 26h/ε. Let vi∗ be a specialvertex in the path P , where πred,i∗ is the index of the red variable in Redtree(vi∗), and πwhite,i∗ is the index of thewhite variable in Whitetree(vi∗). An embedded rectangle (πred, πwhite, A,B, ~w) is special for vi∗ if:

1. |A| = |B| = r;

2. The projection of A onto xπred,i∗ has size r, and the projection of B onto xπwhite,i∗ has size r.

Lemma 3.5.8. Let B be a size s branching program for Tree~F for some ~F such that ¬Bad(~F ) holds. Then (for our

choice of parameters) there is an i∗ ∈ [h] and an embedded rectangle that is special for vi∗.

Proof. Let B be a size s branching program for Tree~F and let (πred, πwhite, Sred, Swhite, ~w) be the embedded rect-angle guaranteed by Lemma 3.5.2. For each j ∈ [h], call vj red-good if |Proj(Sred, πred,j)| ≥ r. That is, vjis red-good if Sred projected to the red variable in Redtree(vj) has size at least r. Similarly, j is white-good if|Proj(Swhite, πwhite,j)| ≥ r.

If there are lred vertices that are red-good, then it is not hard to see that |Sred| ≤ (r − 1)h−lredklred . To seethis, every vj that is not red-good can take on at most r − 1 values, and the red-good ones could take on at mostk values. If we similarly define lwhite to be the number of vertices that are white-good, then similarly we have,|Swhite| ≤ (r − 1)h−lwhiteklwhite .

We want to show that there must exist an i∗ such that vi∗ is both red-good and white-good. If not, then lred +

lwhite ≤ h, and therefore |Sred × Swhite| ≤ (r − 1)hkh < rhkh. But on the other hand, Lemma 3.5.2 dictatesthat |Sred × Swhite| ≥ k2h−ε

s23h2 . This is a contradiction since by our choice of parameters (r = 26h/ε, ε = 9h/ log k,

s ≤(

kn26 log k

)h, n = 3h−1) we have:

41

k2h−ε

s23h2 ≥ k2h−ε

23h2 ·(

326(h−1) log k

k

)h≥ kh−ε210h2

(log k)h

= kh210h2

(log k

29

)hsince ε=

9hlog k ,

= kh210h2

29h

(2h log 9h

εh

)≥ kh26h2

εh= rhkh since 4h+log(9h)−9>0, ∀ h≥2

Let i∗ ∈ [h] denote the index such that vertex vi∗ along the path P is both red-good and white-good. ThusRedtree(vi∗) contains the red variable indexed by πred,i∗, and the projection of Sred to xπred,i∗ has size at leastr. Prune Sred to contain r assignments to xπred , where we have exactly one assignment for each of the r distinctvalues for xπred,i∗ . Similarly, Whitetree(vi∗) contains the white variable indexed by πwhite,i∗, and the projection ofSwhite to xπwhite,i∗ has size at least r. Prune Swhite to contain r assignments to xπwhite , where we have exactly oneassignment for each of the r distinct values for xπwhite,i∗ . Because the pruned sets Sred and Swhite will be importantfor our encoding, the following definition describes these sets more explicitly.

Definition 3.5.9. The (pruned) assignments in Sred consist of r partial assignment to xπred . Each such assignmentgives a distinct value for xπred,i∗ , with the values for the rest of the variables in xπred being completely determined bythese. Let ~αi, i ∈ [r] denote the partial assignments in Sred. That is, for each i ∈ [r], ~αi = α1

i , . . . αhi is a vector of h

values given to redchild(vi) for all i ∈ [h]. Viewing the vectors ~αi, i ∈ [r] as an r-by-h matrix, the entries in columni∗ ( ~αi∗) run over the r distinct values given to xπred . Similarly, Swhite consists of r partial assignments to xπwhite .Let ~βi, i ∈ [r] denote the partial assignments in Swhite. That is, for each i ∈ [r], ~βi = β1

i , . . . , βhi is a vector of h

values given to whitechild(vi) for all i ∈ [h]. Viewed as an r-by-h matrix, the entries in column i∗ ( ~βi∗) run over ther distinct values given to xπwhite .

It is clear from our construction that (πred, πwhite, Sred, Swhite, ~w) is an embedded rectangle that is accepted byB and that is special for vi∗.

3.6 The Encoding

In this section, ~F is a vector of functions, one function each for each non-leaf vertex of the ternary tree, where eachF in ~F is a 4-invertible function from [k]3 to [k]. Let F denote the uniform distribution on 4-invertible functions. LetH(F) refer to the entropy of F . Assume that for each ~F where every constituent function is 4-invertible, we have asize s branching program, B~F for Tree~F .

Our goal is to communicate a random ~F using less bits than is information-theoretically possible (under the as-sumption of a small branching program for Tree~F ). If Bad(~F ) is true, then we simply communicate ~F using the fullH(F) bits that describe a uniformly random 4-invertible function at all the internal nodes of the tree. This requiresH( ~F) = (number of internal nodes) ×H(F) bits. If Bad(~F ) is false, using Lemma 3.5.8, from B~F , we will definea vector of information, L~F , which we call a label that will allow us to encode ~F with fewer bits than is possible onaverage to get a contradiction. The following lemma describes how one can come up with L~F .

42

Lemma 3.6.1. Let ~F be such that Bad(~F ) is false, and assume that Tree~F has a small branching program B~F . Then

there exists a vector L~F that can be specified with at most 4hr log k = O(hr log k) bits such that given ~F−∗ : the

knowledge of all functions in ~F except for F∗ at one special node, L~F can be used to infer r′2 inputs (ai, bj , ci,j) ∈[k]3, i, j ∈ [r′] in the domain of function F∗, where r′ = r

4i∗ and i∗ is the height of node of F∗ and corresponding

to these inputs one can infer r′2 sets of outputs C(i, j) ⊂ [k], i, j ∈ [r′], specifying a small set of values such that

F∗(ai, bj , ci,j) ∈ C(i, j). Moreover,

PrF∼F [∀i, j ∈ [r′]F (ai, bj , ci,j) ∈ C(i, j)] ≤ k−7

9·24h εr2

.

Proof. By Lemmas 3.5.2 and 3.5.8, there is a pathP , a vertex vi∗ ∈ P and an embedded rectangle (πred, πwhite, Sred, Swhite, ~w)

that is special for vi∗.

The vector L~F will consist of:

(0) a description of ~w;

(1) a description of the labelled path P ;

(2) the index i∗ of the special vertex along the path;

(3) a vector < ~α1, . . . , ~αr > of r assignments as described in Definition 3.5.9.

(4) the vector < ~β1, . . . , ~βr > of r assignments as described in Definition 3.5.9.

Figure 3.2 depicts a labelling that is induced by a small branching program. We first check that the length of L~Fis O(hr log k). The length of (0) is n log k = 3h−1 log k. The length of (1) is h log 6, since there are 6h labelled paths(3h−1 different paths, and 2h choices for the labels). The length of (2) is log h. The length of (3) is hr log k, andsimilarly the length of (4) is hr log k. Thus the total length is at most 4hr log k.

Given the vector L~F , the special function F∗ will be the function associated with the vertex vi∗. For each i, j ∈ [r],the corresponding input values (ai, bj , ci,j) for F∗ are obtained by a bottom-up evaluation of the subtree rooted at vi∗as follows. First, using L~F parts (3) and (4) we extract values for all red and white children of vertices in the pathbelow vi∗. Secondly, using L~F part (0) we extract from ~w values for all other leaf vertices of the subtree rootedat vi∗. Now using the knowledge of all internal functions corresponding to nodes below vi∗ (given in ~F−∗), we canevaluate the subtree rooted at vi∗ in a bottom-up fashion in order to determine the values (ai, bj , ci,j) for redchild(vi∗),whitechild(vi∗) and thirdchild(vi∗).

Note that when we evaluate redchild(vi∗), whitechild(vi∗) and thirdchild(vi∗) for each pair of i, j ∈ [r] sinceall of the functions in ~F are 4-invertible, we are guaranteed that there will be at least r′ = r

4i∗ distinct values taken byredchild(vi∗) and similarly r′ = r

4i∗ distinct values taken by whitechild(vi∗) resulting in at least r′2 distinct inputs(ai, bj , ci,j) with i, j ∈ [r′] in the domain of F∗.

We will now describe how to obtain the sets C(i, j) ⊂ [k], i, j ∈ [r′], using L~F and the functions ~F−∗. Fix aninput (ai, bj , ci,j). We want to determine the set C(i, j) of possible values for F∗(ai, bj , ci,j). Recall that for eachi, j ∈ [r′], we know the value given to all inputs of the ternary tree. We want to work our way down the path P ,starting at the root vertex vh in order to determine C(i, j). If the functions in ~F were all invertible, then knowing that

43

# »

αh

# »

αh−1

# »

αh−2

vi∗

#»

βi

# »

α2

# »

α1 ? # »

β1

# »

β2

#»

αi

# »

βh−2

# »

βh−1

#»

β h

A white subtree has at least onewhite leaf node

h− 2

A red subtree has at least one redleaf node

h− 2

Figure 3.2: This figure depicts a label L~F associated with a problem instance Tree~F obtained as a consequence of having a smallbranching program B~F . A label as guaranteed by lemma 3.6.1 consists of a labelled path P reaching a leaf node, a special vertexvi∗ along the path and a vector of r values each: ~α and ~β respectively for the red and white sub trees at each node along the path.(We use blue for white here).

(ai, bj , ci,j) is a yes input, this limits the number of possible values of the root vertex to the set C(i, j)h = [k1−ε].Working down the path, since we know the values of the red child and white child of vh, this in turn gives us anotherset of at most k1−ε values, C(i, j)h−1 that vh−1 can have. We continue in this way down the path until we arrive at aset of at most k1−ε values, C(i, j) that vi∗ can take on.

However we are not working with invertible functions, but instead with 4-invertible functions. This can be handledby a simple modification of the above argument. Again we start at the root of the path vh. As before, we knowthe values associated with the root is the set C(i, j)h = [k1−ε]. At vertex vh′ , we define the set C(i, j)h

′based

on the previous set C(i, j)h′+1. For a particular value z ∈ C(i, j)h

′+1, we know the value of redchild(vh′), andwhitechild(vh′). This gives us values z, a, c. By the definition of Fvh′ being 4-invertible, there are at most 4 valuesof b such that z = Fvh′ (a, b, c). Thus we know the four possible values of b that can lead to z. Running over all z’sin C(i, j)h

′+1 defines the set C(i, j)h′

which has size at most four times the size of C(i, j)h′+1. Thus, the size of

C(i, j)i∗ is at most 4h−i∗k1−ε. We set C(i, j) equal to C(i, j)i∗.

44

A

B

A,B ⊂ [k] |A| = |B| = r′ |(ai, bj , ci,j)|ai ∈ A, bj ∈ B| = r′2

Figure 3.3: A subset in the input domain of Fv∗ with product structure in two coordinates and over which the possiblevalues taken by Fv∗ has low entropy.

Let F be the uniform distribution over all 4-invertible functions from [k]3 to [k]. Let E denote the event thatfor every (i, j), F (ai, bj , ci,j) ∈ C(i, j). It is left to show that PrF∼F [E] ≤ k−

79 εr

22−4h

. Let F ′ be the uniformdistribution over all functions from [k]3 to [k/4]. Lemma 36 below shows that PrF∼F [E] ≤ PrF ′∼F ′ [E]. Thus wehave:

PrF∼F [E] ≤ PrF ′∼F ′ [E] =

(|C(i, j)|k/4

)(r′)2

≤(4 · 4h−i∗ · k−ε

)(r/4i∗)2

≤ k−7

9·24h εr2

.

Proof. (of Theorem 32) We are now ready to complete the proof of our main theorem. Let ~F be the uniform distribu-tion over vectors ~F of all 4-invertible functions from [k]3 to [k]. We prove the theorem by showing that if for every ~F ,

if Tree~F has a size s branching program where s ≤(

kn26 log k

)h, then the expected number of bits required for en-

coding an ~F sampled from the distribution ~F is less than the minimum number of bits required, which is 3h−1H(F),giving us the contradiction. Given ~F , the encoding is as follows.

(1) If ~F ∈ Bad(~F ), encode each function using H(F) bits, thus using 3h−1H(F) bits over all the internal func-tions.

(2) If ~F /∈ Bad(~F ), encode as follows.

(2a) The first part is the description of L~F .

(2b) The second part is an optimal encoding of all of ~F except for F∗.

(2c) The third part is an optimal encoding of F∗. Recall that F∗ is an element from the (uniform) distribution(F | E) where E denotes the event that for every (i, j), F (ai, bj , ci,j) ∈ C(i, j).

Using this encoding, the decoding procedure is as follows. Whenever Bad(~F ) holds, we use the information in(1) in order to recover ~F . Otherwise, if ¬Bad(~F ) holds4, we proceed as follows. First we use the label L~F from (2a)

4Astute reader might have observed that inorder to recognize if Bad(~F ) holds or not one needs to convey information, albeit just 1 bit. We endup saving a lot more so we ignore it.

45

in order to determine vi∗. Then we use label L~F from (2a) along with information about the rest of the functions from(2b) to find the special (r′)2 inputs (ai, bj , ci,j), i, j ∈ [r′] to the function F∗. We also use the label L~F from (2a) andinformation from (2b) to determine the sets C(i, j) ⊂ [k] such that F∗(ai, bj , ci,j) ∈ C(i, j) for all i, j ∈ [r′]. Wecan then determine using the information from (2c) the values F∗(ai, bj , ci,j) for all i, j ∈ [r′] (and also the remaininginputs in [k]3).

We want to compare the savings of this encoding over the optimal one that uses H( ~F) bits. Let p = PrF∼F [E].Then 1/p is equal to the number of 4-invertible functions divided by the number of 4-invertible functions satisfying E.Thus, when ¬Bad(~F ) holds, the savings of our encoding in bits is log(1/p)− |L~F |, and therefore the overall savingsin bits is

(1− pBad)[log(1/p)− |L~F |] ≥ (1− pBad)[

79·24h εr

2 log k − 4hr log k]

=[

79·24h εr

2 − 4hr]

(1− pBad) log k

since by Lemma 3.6.1, |L~F | ≤ 4hr log k and p ≤ k− 79 εr

22−4h

.

In the expression[

79·24h εr

2 − 4hr], the quadratic dependence on r in the first term whereas only a linear depen-

dence in the second allows us to choose r = 26h

ε , large enough so that we make savings. At r = 26h

ε ,[

79·24h εr

2 − 4hr]

=

r[

79·24h 26h − 4h

]> r ∀h ≥ 1. Also, by Lemma 3.4.1 we know pBad ≤ 1

10 and since k ≥ 242h this implies(1− pBad) log k > 1. Thus our savings is greater than r bits, giving a contradiction.

Lemma 36. Let F be the uniform distribution over all 4-invertible functions from [k]3 to [k] and let F ′ be the uniformdistribution over all functions from [k]3 to [k/4]. Fix r2 inputs τi, i ∈ [r2], and let Ci be a corresponding subset of[k], such that ∪iCi ⊆ [k/4]. E be the event that for all i, F (τi) ∈ Ci. Then PrF∼F [E] ≤ PrF ′∼F ′ [E].

Proof. Before we proceed with the proof, wish to mention that when we use this lemma in the proof of 3.6.1 the setsCi involved need not be such that ∪iCi ⊆ [k/4]. However, since |∪iCi| ≤ k/4, one can simply consider an alternativerange of size k/4 that contains ∪iCi for functions in F ′ instead of [k/4] to arrive at the same upper bound estimateon PrF∼F [E]. So we assume here in the hypothesis just for the ease of exposition that ∪iCi ⊆ [k/4]. Proceedingwith the proof, let Ei denote the event that F (τi) ∈ Ci, and let E<i denote the event that for all j < i, F (τj) ∈ Cj .Then PrF∼F [E] =

∏i PrF∼F [Ei | E<i]. We will show that for any i, PrF∼F [Ei | E<i] ≤ PrF ′∼F ′ [Ei]. Let σ

specify the values of F for all tuples except for τi. Then PrF∼F [Ei | E<i] ≤ maxσPrF∼F [Ei | σ]. That is, thetrue probability is at most the probability where we fix all values except for the value of F on τi to the worst possiblescenario.

We want to show that this probability only increases when the distribution switches from F to F ′. But then notethat under the distribution F ′, the values σ do not change the probability. Thus we want to show: PrF∼F [Ei | σ] ≤PrF ′∼F ′ [Ei | σ] ≤ PrF ′∼F ′ [Ei].

To prove the first inequality, note that σ specifies all but one of the [k]3 inputs to F . We visualize this as a k-by-k-by-k cube, where all entries (x, y, z) are filled in with a value in [k] except for the one entry corresponding to τi. Wewant to get an upper bound on how many values we can choose for this last entry and still have a 4-invertible function.When choosing this last value, in order for F to be 4-invertible, we cannot choose one of the at most k/4 values thatalready appears four times along the “x” dimension, or one of the at most k/4 values that already appears four timesin the “y” dimension, or k/4 times in the “z” dimension. This rules out at most 3k/4 values, leaving at least k/4possible values. Thus there is a set of at least k/4 values that can legally be filled in for F (τi) (even under the worstpossible σ), and because F is uniform on such functions, these completions all have the same probability. The event

46

Ei is when F (τi) is chosen to be in Ci. This probability is at most that for the distribution F ′ on all functions from[k]3 to [k/4].

3.7 Conclusion

It is open to prove lower bounds for function composition for the case of Boolean nondeterministic semantic read-oncebranching programs. In fact, it is open to prove lower bounds for the Boolean case for any explicit function. Anotherlongstanding open problem is to break the Neciporuk barrier of n2/ log2 n for deterministic branching programs, andn3/2/ log n for nondeterministic branching programs. When g is the parity function, this bound is optimal. Lowerbounds for f g for g equal to the element distinctness function (or even for the majority function) would be asignificant breakthrough.

3.8 Proofs

Proof of lemma 3.4.1: For k > 242h and ε = 9hlog k , Pr~F [Bad(~F )] ≤ 1

10 .

Proof. We will choose a random ~F somewhat indirectly as follows. First, we sample a random vector ~F ∈ ~F . Sec-ondly, we choose a random permutation Π of the values [k], and let Π(~F ) be the same as ~F except that the root valueshave been permuted by Π. (This requires only changing the outputs of the function at the root.) Note that this distribu-tion on ~F is identical to the uniform distribution over ~F . It follows that Pr~F [Bad(~F )] = Pr〈~F ,Π〉[Bad(Π(~F ))]. We

will consider the worst case value of ~F in order to bound the above probability. Observe that

Pr〈~F ,Π〉

[Bad(Π(~F ))] ≤ Max~F PrΠ

[Bad(Π(~F )) | ~F ].

Fix such a worst case ~F . For this ~F , for each value v ∈ [k] let qv denote the fraction of leaf values ~ξ that give value vat the root. Note

∑v qv = 1 and Avgv qv = 1

k .

Because the permutation Π is randomly chosen, Π−1([k1−ε]) is a random subset of [k] of size k1−ε. Therefore vialinearity of expectation,

Exp

(|Syes||~ξ|

)= Exp

∑v∈Π−1([k1−ε])

qv

=k1−ε

k= k−ε.

We want to bound the probability that the size of Syes is significantly smaller than its expected value of k1−ε. Butfirst, the lemma below proves that 0 ≤ qv ≤ 4h−1

k .

Lemma 3.8.1. ∀v ∈ [k], qv ≤ 4h−1

k.

Proof. Fix ~F . Fix all of the leaf values as in ~ξ, except for the left most leaf. Working down from the root, for anyvalue v at the root one can see that there are at most 4h−1 values in [k] for this left most leaf that can lead to value v atthe root of ~F . This is because each internal function is 4-invertible and for any fixed value of an internal node, giventhe value of two of its children(subtree evaluations) there are at most 4 possible values the other child can take.

47

We select a uniformly random set of size k1−ε to be mapped to [k1−ε] as follows. Flip a biased coin for each point‘v’ in [k] to be selected with probability k−ε. Given a vector of qv describing the fraction of inputs that map to v, letQv be a vector of random variables associated with corresponding coin flips with each of them taking value qv withprobability k−ε and 0 with the remaining 1 − k−ε. The expected number of points selected is k1−ε. The experimentrepeats until the number of points selected is within some standard deviations say c.k

1−ε2 of the mean k1−ε. Let’s first

analyze the number of inputs selected corresponding to the points selected in the process without the size requirementon number of points.

We are interested in the fraction of inputs that get to be Yes inputs as a result of being selected during the coinflipping process. Let QY es =

∑v Qv . So

E[QY es] =∑v

E[Qv] =∑v

qvk−ε = k−ε. (3.1)

In this experiment Qv are independent (but not necessarily identically distributed) non-negative random variables.Consequently QY es obeys the following concentration bound [14] around its mean

Prob [ (E[QY es]−∑v

Qv) ≥ t ] ≤ e

(−t2

2∑v E[Q2

v ]

)(3.2)

Since by the regularity property from Lemma 3.8.1 we have qv ≤ 4h−1

kfor all v ∈ [k]

∑v

E[Qv2] =

∑v

qv2k−ε = k−ε

∑v

qv2 ≤ k−ε

∑v

(4h−1

k

)2

= k−εk ·(

4h−1

k

)2

=42h−2

k1+ε

=⇒ Prob [ (E[QY es]−QY es) ≥ t ] ≤ e

(−t2

2∑v E[Q2

v ]

)≤ e

−t2

2

(42h−2

k1+ε

)

= e−t2k1+ε

2·42h−2

Consequently,

Prob [QY es ≤ E[QY es]− t ] ≤ e−t2k1+ε

2.42h−2 (3.3)

Set t = 12kε for the event Bad′ = [QY es ≤ E[QY es]− t ] = [QY es ≤ 1

2kε ].

pBad′ = Prob

[QY es ≤

1

2kε

]≤ e

−k1−ε

8.42h−2 (3.4)

Now consider the following transformed process in which the experiment repeats until number of points selectedis within some fixed deviation g from the mean. Let the set of points be A. Depending on the count of number ofpoints in A selected, if the count falls below k1−ε a few more points are uniformly randomly selected from [k] \ A toobtain a set of size k1−ε and likewise if the number is larger than k1−ε the required number of points are uniformlyrandomly discarded from the set. Clearly, this process doesn’t discriminate against any point in [k] and so generatesa uniformly random subset of size exactly k1−ε from [k]. Let call this set A

′′, it shall be our final set of size k1−ε.

Let pBad be the probability that the fraction of inputs associated with the set of points in A′′ is less than 16kε . For the

48

intermediate set A let U be the event [k1−ε − g ≤ |A| ≤ k1−ε + g]. Then,

Prob [Bad′ | U ] =Prob(Bad′ ∩ U)

Prob(U)≤ Prob(Bad′)

Prob(U)(3.5)

Since |A| is binomially distributed with (n, p) = (k, k−ε), seen as a sum of independent non-negative randomvariables, for a deviation g ≈ 2k

1−ε2 we have the following concentration guaranteed by (3.2)

Prob(U) = Prob[k1−ε − 2k

1−ε2 ≤ |A| ≤ k1−ε + 2k

1−ε2

]≥ 0.8 (3.6)

By (3.4) it follows that Prob(Bad′) ≤ e−k1−ε

8.42h−2 and together with (3.6) and (3.5) this implies

Prob [Bad′ | U ] ≤ 5

4e−k1−ε

8.42h−2 (3.7)

the chance that SAY es is small is exponentially small. Now consider the transformation of A to A′′. Note that whenevernew points are added to A or some points in A are discarded so as to obtain A′′ i.e a uniformly random choice ofa set of exact size k1−ε the change from SAY es to SA

′′

Y es is at most g.maxv qv . But by regularity property given byLemma 3.8.1, qv ≤ 4h

k . So∣∣∣|SA′′Y es| − |SAY es|

∣∣∣ ≤ g. 4h

k ≈ k1−ε

24h

k = 4h

k1−ε

2+ε

=(

4h

k1−ε

2

)1kε ≤

13kε for k > 242h at

ε = 9hlog k . The resulting set A′′ will then always have size at least 1

2kε −1

3kε = 16kε whenever QAY es >

12kε . This

implies pBad = Prob[QA

′′

Y es ≤ 16kε

]≤ Prob

[QAY es ≤ 1

2kε

]= Prob [Bad′ | U ] and hence ≤ 5

4e−k1−ε

8.42h−2 .

For k > 242h and ε = 9hlog k it can be seen that pBad ≤ 5

4e−k1−ε

8.42h−2 ≤ 54 ( 1e

242h−9h

24h−1 ) ≤ 1228h ≤ 1

10 , ∀h ≥ 1.

3.8.1 Neciporuk via Function Composition

Consider the composition of two boolean functions f : 0, 1a → 0, 1 and g : 0, 1b → 0, 1. Let f be a hardfunction in the sense that any non-deterministic branching program computing f requires size at least 2a/2. Suchfunctions are guaranteed to exist by a simple counting argument. Fix g to be any function such that it does not take aconstant value when all but any one of its b input bits are set.

Lemma 3.8.2. Any non-deterministic branching program solving f g has size at least b2a/2.

Proof. Let there be a non-deterministic branching program solving f g of size s. For each of the a copies of g inthe composition f g pick the least queried input bit from amongst each group of b input bits that correspond to asingle copy of g, then set all remaining b − 1 variables in this input group to any value and reconnect the outgoingedges amongst the remaining states appropriately. The resulting collapsed branching program has size at most sb . Butrecall that g has the property that fixing b− 1 of its input bits doesn’t make the function a constant. Thus the resultingcollapsed branching program has to have size at least that required for computing f , that is 2a/2. Therefore the originalnon-deterministic branching program must have size at least s ≥ b2a/2.

Let g = ⊕ be the parity function on b bits. The input to f ⊕ is the description of f , plus a vector of ab bits (theinput to f⊕). The input length is 2a+ab. Setting a = log n and b = n

logn , the input length is 2n. By the above lemma,

the size of a branching program required to solve the composition f ⊕ is at least b2a/2 =(

nlogn

)(2

logn2

)= n3/2

logn .This lower bound is also known to be the best achievable by Neciporuk as shown by Beame and McKenzie in [11].

49

An essentially similar argument appearing in Section 4.1 in the last chapter shows that any deterministic branchingprogram solving f ⊕ requires size at least b 2a

a . Set a = log n and b = nlogn to obtain an Ω

(n2

log2 n

)lower bound.

3.8.2 The lower bound holds for most ~F

We now argue that for most vectors of 4-invertible functions ~F , Tree~F does not have a small branching program. Weshow that the probability that a uniformly randomly chosen ~F has a small branching program is at most pBad + 1

2r ≤1

227h . First, let #L = 2|L~F | be the total number of labels. Recall that |L~F | is the number of bits needed to encode alabel and that the number of bits saved in our alternate encoding from the proof of Theorem 32 is (1−pBad)[log(1/p)−|L~F |] = (1− pBad) log

(1

p·#L

).

Note that for a uniformly randomly chosen ~F the probability that it has a small branching program is at mostthe chance that Bad(~F ) holds plus the chance that Bad(~F ) doesn’t hold and there exists a label L that is consistentwith ~F (in other words a label obtained via lemma 3.6.1 as a guaranteed consequence of ~F having a small branchingprogram).

Pr~F [∃ a small BP solving Tree~F ]

≤ Pr~F [Bad(~F ) ∪ [¬Bad(~F ) ∩ ∃ a label L consistent with ~F ]

≤ pBad + Pr~F [¬Bad(~F ) ∩ [∃ a label L that is consistent with ~F ]] (by Union bound)≤ pBad + Pr~F [∃ a label L that is consistent with ~F ] (since P (A ∩B) ≤ minP (A), P (B))≤ pBad + #L ·maxL Pr~F [ label L is consistent with ~F ] (by Union bound)≤ pBad + p.#L

We have shown in the proof of theorem 32 that the number of bits saved in our alternate encoding is is at least r. So,(1− pBad) log( 1

p·#L ) ≥ r =⇒ 1p·#L ≥ 2r/(1−pBad) ≥ 2r =⇒ p ·#L ≤ 1

2r .Consequently it follows that:

Pr~F [∃ a small BP solving Tree~F ] ≤ pBad + 12r

Now note that the proof of Lemma 3.4.1 (see section 3.8) actually shows that pBad ≤ 2−28h. As a result, Pr~F [∃ asmall BP solving Tree~F ] ≤ 1

228h + 12r ≤

1227h . (the last inequality follows since r = 26h

ε = 26h log k9h ≥ 26h+2). Thus

we can conclude that most vectors of 4-invertible functions in fact do not have small branching programs.

50

Chapter 4

General Branching Programs

51

4.1 Nechiporuk’s method and its limitations

In this chapter we focus entirely on general branching programs and attempt improve what is achievable byNechiporuk’s method. Recall from Chapter 1 that Nechiporuk’s method [51] gives a lower bound for the BP of anarbitrary function f . Fix a partition of the variable set X into m disjoint sets Y1, Y2, .., Ym. For each Yi let ci(f)

denote the number of possible sub-functions on Yi obtained by fixing the variables outside Yi to all possible values.One can use this information to get a lower bound on BP (f).

Theorem 37. Nechiporuk’s MethodThere exists a constant ε > 0 such that for every boolean function f that depends on all its inputs and for everypartition of its variable set X into m sets,

BP (f) ≥ εm∑i=1

log ci(f)

log log ci(f)

Similarly for k-way branching programs, let kBP(f) denote the corresponding complexity measure. UsingNechiporuk’s method gives,

kBP (f) ≥ ε

k

m∑i=1

logk ci(f)

logk log ci(f)(4.1)

However its known that for any function on n bits, Nechiporuk’s method cannot yield larger than Ω(

n2

log2 n

)lower

bound on branching program size. Similarly for k-way branching programs the best that Nechiporuk can achieve isΩ(

n2

k log2 n

). Since we often contrast what we do with that achievable by Nechiporuk, below we present an argument

from [39] describing this inherent limitation of Nechiporuk’s method to keep the discussion self contained.

Consider an arbitrary partition Y1, Y2, .., Ym of the variables of the set of variables X . Let |Yi| = ti. Consider ci(f),the number of sub-functions of f on Yi. The two limiting factors on ci(f) are ti, the size of Yi and the number ofways in which remaining variables in X out of Yi can be set. So ci(f) ≤ minkn−ti , kkti . Equivalently,log ci(f) ≤ minn− ti, kti (this being the case of k-way BPs, the argument is similar for Boolean branchingprograms.) Note that the two quantities are comparable in value when ti ≈ logk n. Without loss of generality, let thefirst r sets Yi, 1 ≤ i ≤ r be such that ti ≥ logk n. Since Y1, Y2, .., Ym constitutes a partition of X , we haver ≤ n

logk n. The contribution to the sum in (4.1) of each of these terms is n−ti

logk(n−ti) ≤n

logk nand so in total the large

pieces contribute ≤ nlogk n

nlogk n

= O(

n2

log2k n

). Now consider the pieces where each ti ≤ logk n. The contribution of

each piece is again at most kti

logk(kti )= kti

ti= O

(n

logk n

). Since kx

x is convex the total contribution from the sumover these smaller blocks is maximized when each of the ti, however many, take their largest possible value i.elogk n. As a result m− r = O

(n

logk n

). So the total contribution from the smaller blocks is also at most O

(n2

log2k n

).

So the kBP (f) given by Nechiporuk is O(

n2

k log2k n

).

Table 4.1 summarizes these observations along with the lower bounds we achieve for a function taking values in [k].Every appearance of n in the table stands for the length of an input for the problem instance, which is TEP 4

d .

52

Model Lower bound via Nechiporuk’s Method Lower Bounds we obtain

Deterministic k-way BPs d2−14(d−1)2 k

2d−1 = Θ(

n2

k log2k n

)≈ Θ

(n2− 1

d

log2k n

)d

d2−1k(2d−1/d) = Ω

(n

2− 1d2

log2k n

)Deterministic Binary BPs d2−1

2d−2k2d = Θ

(n2

log2 n

)12dk

2d = Ω(

n2

log2 n

)

4.2 Our Results

We give a better lower bound than is possible by using Nechiporuk’s method for k-way branching programs solvingTEP 4

d . Using essentially the same method we give a matching lower bound to that achievable for a boolean functionby using Nechiporuk’s method for binary branching programs to within a log factor. For the case of a function takingvalues in [k] we achieve a matching lower bound. The interesting aspect of this method is that it seems plausible toimprove on one of the parts of the argument. Any marginal improvement here would be consequential. We thenproceed to give some surprising branching programs that are different from naive upper bounds and are based onsome surprising communication complexity protocols. Our aim here is to improve our understanding of a possibleapproach to prove the lower bound but the upper bounds and the connections to communication complexity thereinmight themselves be of independent interest.

4.3 Lower bounds via communication complexity improving onNechiporuk for k-way BPs.

In this section we give a lower bound for k-way BPs solving TEP that beats the lower bound achievable byNechiporuk’s method. We begin with the following theorem due to Stephen Cook1 which translates the challenge ofproving lower bounds on the size of branching programs solving the tree evaluation problem on trees of degree d andheight h to that of proving lower bounds on the number of leaf querying states in branching programs solving treeevaluation problem on trees of degree d and height h− 1. We reproduce the proof and observe that this theorem infact also holds in a similar sense if one considers binary branching programs and leaf bit querying states in place ofk-way branching programs and leaf querying states.

Theorem 38 (Leaf Queries). Given any k-way(resp. binary) branching program solving TEPhd (k) of size s one canobtain a k-way(resp. binary) branching program solving TEPh−1

d (k) which makes at most s/kd leaf(resp. leaf bit)queries.

Proof. Consider a branching program B of size s solving TEPhd (k). Observe the states in B querying internalfunction nodes at the level immediately above leaves. Since there are kd values to query each internal function at,there exists a tuple of values (r1, r2, .., rd) ∈ [k]d at which these internal functions are queried the least often, that isthere are at most s/kd states querying level 2 function values at (r1, r2, ..rd). Now consider an arbitrary instance ofTEPh−1

d , in the branching program B solving TEPhd (k), we shall interpret the function value at (r1, r2, ..rd) foreach of these internal functions at level 2 to be the same as corresponding to a leaf value in the instance of TEPh−1

d .Also, set the leaf values in TEPhd so that they point to (r1, r2, ..rd) for all internal functions in level 2. Now connectall incoming edges into leaf querying states in B directly to the state that the edge labelled with its value ri goes to.

1Please refer the manuscript titled “New Results for Tree Evaluation” at http://www.cs.toronto.edu/˜sacook/

53

http://www.cs.toronto.edu/~sacook/

Remove states querying internal functions at level 1 at any other point other than (r1, r2, .., rd) and simply connectthe incoming edges to that state to any state previously connected by some outgoing edge. Thus by collapsing thebranching program in this manner we obtain a new branching program that solves TEPh−1

d and makes at most s/kd

leaf queries.Observe that if B were a binary branching program, solving a problem instance of TEPhd presented in binary, therewould be k2 log k binary variables describing each internal function. Nevertheless, once again, we can consider thelog k variables corresponding to each location in the domain [k]d as a single unit and make a similar averagingargument to identify the log k bit unit (r1, r2, .., rd) ∈ [k]d that receives the least number of queries. The proof thenproceeds by pruning and redirecting connections via leaf bit querying states B in a similar fashion as above so thatthey point to (r1, r2, .., rd).

The following section is devoted to proving lower bounds on the number of leaf queries in a branching programsolving TEP so as to leverage the kd factor provided by Theorem 38 .

4.3.1 Lower Bound on the Number of Leaf Reading States

Lets define the following communication game which we shall use on the way to proving our leaf lower bound.

4.3.2 A Communication Game

We define a game that is a number in the hand game as each of the players has an input that the others do not knowand is a number on the forehead game as they all know a number on the referee’s forehead and he in turn knows theirinputs.

Definition 39. The 〈k, d〉 function game is defined as follows. The input consists of a d-input function f : [k]d ⇒ [k]

and d2 inputs x〈i,j〉 ∈ [k] for i ∈ [d] and j ∈ [d]. The required output is f(∑

j∈[d] x〈1,j〉, . . . ,∑j∈[d] x〈d,j〉

). There

are d2 players and a referee. Player 〈i, j〉 ∈ [d]2 knows⟨f, x〈i,j〉

⟩. Simultaneously they each send some s〈i,j〉 bits to

the referee. There is zero communication between them. For free, lets assume the referee knows the indexes x〈i,j〉but knows nothing about f except what the players send him. He needs to learn the output. The cost of thecommunication game is the maximum taken over all possible inputs of the total number of bits s =

∑i∈[d],j∈[d] s〈i,j〉

sent to the referee.

The following lemma translates lower bounds to the communication game to lower bounds for the size of branchingprogram solving TEP 3

d . In order to do this consider a TEP instance of height 3 where the function at the root is fand level 2 functions are all addition mod k.

Lemma 40. Any branching program computing f(∑

j∈[d] x〈1,j〉, . . . ,∑j∈[d] x〈d,j〉

)when f is part of the input

containing s′ leaf reading states can be translated into an algorithm for the 〈k, d〉 function game with d2 players and areferee with s = s′ log s′ bits communicated2.

Proof. Assume all d2 players know the branching program. Denote the leaf reading states of the branching programu1, . . . , us′ .These leaf reading states are partitioned into d2 groups where states in group (i, j) ∈ [d]2 are all of the states thatread xi,j . Then player (i, j) who gets as input the value of f plus a value for xi,j performs the following

2Actually s = s′ log(s′ + 1).

54

computation: For each state u in group (i, j), player (i, j) using his information, traces a unique directed path startingat u. He continues as long as the current node is either a leaf-reading state in group (i, j) or is a function-reading state(since for both of these types of states, the player can determine which edge to follow). The path ends the first timeplayer (i, j) encounters a leaf-reading state in some group other than (i, j). Player (i, j) sends the name of this stateto the referee and he does the same for all other states like u in his group in some pre-determined order. In order to dothis he uses log s′ bits to indicate this node (or the fact that some output state was reached instead).

The referee knowing⟨x〈1,1〉, x〈1,2〉 . . . , x〈d,d〉

⟩and all the player’s output can easily trace out the computation path

through the branching program from the start state until an output state is reached determiningf(∑

j∈[d] x〈1,j〉, . . . ,∑j∈[d] x〈d,j〉

). (If the start state of the branching program is an f reading state one can follow

some convention as to which player communicates an additional log s′ bits to handle the scenario. The referee in factdoes not need to know the inputs

⟨x〈1,1〉, x〈1,2〉 . . . , x〈d,d〉

⟩to infer the output using the branching program.)

In total since there are s′ leaf-reading states, a total of s′ log s′ bits are sent to the referee.

Lemma 41. The 〈k, d〉 function game requires s ≥ kd−1/d log k bits.

Proof. Suppose we have an algorithm for the 〈k, d〉 function game that communicates s bits. From this we design analgorithm in which the referee learns from the players f (z1, . . . , zd) for each z1, . . . , zd ∈ [k]. Knowing that thisrequires the communication of kd log k bits, gives us a bound on s. For each i ∈ [d], the log k bits of zi arepartitioned into d parts. For j ∈ [d], Player 〈i, j〉 will use his value x〈i,j〉 to specify the bits in the jth part via the sumzi =

∑j∈[d] x〈i,j〉. Define Xj =

xj = a · (2log(k)/d)j−1 | a ∈ 0, 1log(k)/d

to be the set of values with zeros in

the bits out side of his block. Note that each value of zi can be made from combinations of these x〈i,j〉 ∈ Xj , namelyzi ∈ 0, 1log(k)

=zi =

∑j∈[d] x〈i,j〉 | x〈i,1〉, . . . , x〈i,d〉 ∈ Xj

. Knowing f , Player 〈i, j〉 for each x〈i,j〉 ∈ Xj

sends the referee the s〈i,j〉 bits that he sends in the 〈k, d〉 function game protocol when knowing⟨f, x〈i,j〉

⟩. Note,

because the players do not communicate with each other, we can know what one player will do without specifyingthe other player’s inputs. The total number of bits sent is

∑i∈[d],j∈[d] |Xj | · s〈i,j〉 = |0, 1log(k)/d| · s = k1/d · s. For

each z1, . . . , z` ∈ [k], the referee can select one message from each player and determine f (z1, . . . , zd). Hence, theplayers must be sending k1/d · s ≥ kd log k bits, giving s ≥ kd−1/d log k as required.

Theorem 42. Any deterministic k-way branching program solving TEP 4d has size Ω(k2d−1/d)

Proof. The proof follows from Theorem 38, lemma 40 and lemma 41. The details go as follows. By lemma 40, wehave s′ log s′ ≥ s. By lemma 41 we have s ≥ k(d−1/d) log k. Consequently,s′ log s′ ≥ k(d−1/d) log k =⇒ s′ ≥ d

d2−1kd−1/d = Ω(kd−1/d). By Theorem 38 it follows that TEP 4

d has size at

least(

dd2−1

)k2d−1/d = Ω(k2d−1/d).

Note that this lower bound beats the best achievable by Nechiporuk’s method, which is O(

n2

k log2 n

)= O(k2d−1).

Also note that for d = 2 i.e TEP 42 this gives a lower bound of Ω(k3.5).3 The belief is that the true lower bound is

Ω(k4).

3Any improvement on the lower bound for this communication game would break the√k barrier for simultaneous communication complexity

of computing Generalized Addressing function from BGKL [4]

55

4.4 Technique doesn’t give bounds that grow with h

Consider a d-ary tree with height h with arbitrary functions f, g : [k]d → [k] at each node. The pebbling lowerbounds would say that something like (d− 1)h+ d pebbles would be needed and hence the conjecture is that((d− 1)h+ d) log k space or k(d−1)h+d states are needed. Our lower bound technique naturally translates into thefollowing game. There is a player for each leaf who knows his leaf value and all of the f, g, .. and simultaneouslycommunicates to the referee who must know the output. There are ≈ dh functions f, g, .. requiring at most a total ofdh × kd log k bits of information for the players to send. This can only potentially give a dh × kd lower bound on thenumber of states needed or h log d+ d log k bound on the space. For d << k this is way too small in comparison to((d− 1)h+ d) log k.

4.5 Lower Bound for Binary Branching Programs

We now turn our attention to binary branching programs solving TEP. Here the description of the TEP problem ispresented in binary and each state in the branching program reads an individual bit from it. We show the following:

Theorem 43. Any deterministic binary branching program solving TEP 4d has size Ω(k2d).

First we show a lower bound on the number of states querying input bits that make up the description of the leafvalues in an instance of TEP 3

d and then invoke Theorem 38 to prove the lower bound for TEP 4d . Consider a TEP 3

d

problem with function f at the top given as part of the input and where level 2 nodes are any fixed onto functions. LetB be a branching program solving such an instance of TEP 3

d . Now collapse the program B by hardwiring the givenfunction f into it. Since g is onto the resulting branching program can be used to infer the function f . As a result ifone simply counts the number of branching programs one can construct over s leaf querying states (which arepre-labelled) this should be at least the number of possible functions f , giving us s2sleaf

leaf ≥ kkd =⇒ sleaf = Ω(kd).It then follows from Theorem 38 that the size of a binary BP solving TEP 4

d is Ω(k2d).

Note that since the size of input is n = O(kd log k) this lower bound matches the best achievable by Nechiporuk’smethod when the function involved takes values in [k], which is O

(n2

log2 n

)= O(k2d). Also note that it follows from

Theorem 43 that any deterministic binary branching program solving TEP 42 has size Ω(k4).

Instead of having the topmost function f take values in [k] if we restrict it to being a boolean function however so thatthe function composition finally evaluates to a boolean function, we see that we lose a log k factor in the leaf lowerbound, we obtain s′ = Ω

(kd

log k

)since the information content of f is lesser by precisely that log k factor when f is

boolean. This gives a size lower bound of Ω(k2d

log k

)= Ω

(n2

log3 n

)on deterministic binary branching programs

solving this boolean function evaluation. We shall save this lost log soon but using different parameters forcomposition.

4.5.1 A Conjecture

Note that on the way to proving Theorem 43 any significant improvement one can make to the exponent in leaf bitlower bound for TEP 3

d for any d > 2 would lead to an improvement on the record set by Nechiporuk. We make thefollowing seemingly very plausible conjecture. If true, this will beat the best known branching program lower boundfor a polytime computable function,

56

next to be queried

Figure 4.1: This figure illustrates a pebbling configuration of TEP 34 at which the corresponding branching program

has k2(d−1) = k6 leaf querying states.

Conjecture 44. Any binary branching program solving TEP 3d would need to make at least kd+c leaf-bit queries for

some c > 0.

One of the reasons this seems plausible to be true is the naive upper bounds originating from pebbling algorithms forthis problem would require at least as large as k2(d−1) leaf query states where as the lower bound only seeks to showthere are at least kd+c of them for some c > 0.

Note that the improvement desired in Conjecture 44 for the leaf query state lower bound can’t be achieved via the useof counting argument like above or the communication game we used previously since even if the level 2 functionsare given as part of the input the players can always communicate simply by writing down all the functions involvedon the board for the referee to infer everything. On doing this the information content in the counting argument or thecommunication cost of the game only adds up over the functions, but in order to prove Conjecture 44 we need somekind of multiplicative effect to show up. Before we elaborate on this further lets clean things up a bit and save the log

we seem to lose.

4.5.2 Composition at different parameters

Consider the composition of two boolean functions f : 0, 1a → 0, 1 and g : 0, 1b → 0, 1. Let f be a hardfunction in the sense that any branching program computing f requires size at least 2a

a . Such functions areguaranteed to exist by a simple counting argument. Fix g to be any function such that it does not take a constant valuewhen all but any one of its ‘b’ input bits are set. We claim that any branching program solving f g has to have sizeat least b 2a

a .

Proof. Let there be a branching program solving f g of size s. For each of the a copies of g in the compositionf g pick the least queried input bit from amongst each group of b input bits that correspond to a single copy of g,then set all remaining b− 1 leaves in this input group to any value and reconnect the outgoing edges amongst theremaining states appropriately. The resulting collapsed branching program has size at most sb . But recall that ‘g’ hasthe property that fixing b− 1 of its input bits doesn’t make the function a constant. As a result the resulting collapsedbranching program has to have size at least that required for computing ‘f’ that is, 2a

a . So the original BP must havesize at least s ≥ b

(2a

a

).

In the above observation set a = log n and b = nlogn . So the length of input is ab = n bits. Also an explicit

description of f uses at most 2a = n bits. For f and g chosen as described above the size of a branching programrequired to solve the composition f g is at least b

(2a

a

)=(

nlogn

)(n

logn

)= n2

log2n . This is also the best possible byNechiporuk. In a way what we are doing here is exactly like Nechiporuk, where the input bits are partitioned into b

57

f

+

U V

+

X Y

Figure 4.2: TEP problem with bitwise addition at level 2.

sets Y1, Y2, .., Yb where each Yi in the partition has a input bits with exactly one bit coming from each of the a copiesof g. Nevertheless, this is interesting because of the composition structure, the naive branching program upper boundfor solving f g is to have a branching program solving for f with in which each state querying an input for f isreplaced by a branching program solving the function g on relevant inputs. Such a BP has size BP (f)×BP (g). If atthe above parameters a = log n and b = n

logn one can show that the naive upper bound is in fact the best one can dowhen g is some well chosen function that requires a super linear branching program then we can hope to betterNechiporuk. In [24] the authors show an analogous statement is indeed true in the case of the weaker computationalmodel of formulas when g is the parity function. Parity itself is not good enough to better what Nechiporuk gives forbranching programs since it can be solved by linear size BPs. Element-Distinctness on the other hand has a quadraticlower bound and might be a good candidate for the function g.

4.6 Surprising Upper bounds

While we wish to prove leaf query lower bounds so as to prove lower bounds via Theorem 38 it might be informativeto understand how branching programs can avoid having a lot of leaf queries. In order to do this, we take the booleanversion of tree evaluation problem BTh2 and provide k-way branching program upper bounds that have a smallernumber of leaf queries than one might expect.

4.6.1 Upper Bounds

Consider a height 3 binary tree.(Refer Figure 4.2) Let the root function be any arbitrary function f from [k]× [k] thatmaps to 0, 1. Let leaf values be X,Y, U and V . Let the level 1 functions both be bit-wise addition, denoted with aslight abuse of notation by X + Y . We will exhibit a k-way branching program that solves the tree evaluationproblem of computing:f(X + Y,U + V ) using at most k1.82 leaf reading states and at most k3.82 f reading states.We transform surprising communication protocols given by Babai,Gal,Kimmel and Lokam [4] and Ambaianis andLokam [3] for solving the generalized addressing function problem in the simultaneous communicationcomplexity [6] setting into surprising branching program upper-bounds with fewer than expected number of leafqueries.

Theorem 45. There is a k-way branching program solving the tree evaluation problem f(X + Y,U + V ) withO(k1.82) states querying the leaves X,Y, U, V and O(k3.82) states reading the function f

58

Proof. Let t = log k.

The branching program first reads leaf U and then reads V and remembers W = U+V . Let h = |X+Y | denote thehamming weight of X+Y . This can be computed easily by counting for each i ∈ [t] whether Xi+Yi is one. Then wealternate reading X and Y t times while inspecting the bits at individual indices one at a time. This level of thebranching program, having known W and the hamming weight h = |X+Y | now has k log k states. For each of thesethe branching program will have a separate block continuing the computation with only k0.82 leaf reading states andk2.82 function reading states each.

The first observation is that for any function A : 0, 1t → 0, 1, there is a multi-linear polynomial fh of degree atmost t/2 that agrees with A on all inputs with hamming weight h, i.e fh(z) = A(z) for every z with |z| = h. We willassume that h ≤ 1

2 t, because the other case can be handled in the same manner after switching the roles of the 0 and1 bits.

Let T ⊂ [t] be a set of indices of size h. Let Z(T ) be the vector of length t with one in the h places indicated by T .We will use fW (T ) as a simplification to denote f(Z(T ),W ). A simple test whether or not Z(T ) = X+Y iswhether Πi∈T (Xi+Yi) = 1. The reason is because the product is 1 iff X+Y is one in all the places indicated by Tand we know that h = |T | is the number of ones in X+Y . It follows that

f(X+Y, U+V ) =∑|T |=h

fW (T )×Πi∈T (Xi+Yi)

Multiplying each of these products out gives a term for each partition T1∪T2 = T of T into disjoint parts.

=∑|T |=h

fW (T )×∑

T1∪T2=T

XT1YT2

Here XT1= Πi∈XT1

Xi denotes the product of the bits of X that are indexed by T1 and YT2the same for Y . Now like

in Babai,Gal,Kimmel and Lokam [4] we separate these based on how big T1 is and factor out the common term.

=∑

|T1|≤h/2

∑T2:|T1∪T2|=h

fW (T1∪T2)× YT2

·XT1+

∑|T2|≤h/2

∑T1:|T1∪T2|=h

fW (T1∪T2)×XT1

· YT2.

The two sums are similar, so consider the first. The number of T1 terms is the number of sets of size at most 12h ≤

14 t

which is at most∑t/4i=0

(ti

)= k0.82.

Define the single bit BT1(fW , Y ) to be

∑T2:|T1∪T2|=h fW (T1∪T2)× YT2

and the single bit CT2(fW , X) to be∑

T1:|T1∪T1|=h fW (T1∪T2)×XT1, giving

fh(X + Y ) =∑

|T1|≤h/2

BT1(fW , Y ) ·XT1 +∑

|T2|≤h/2

CT2(fW , X) · YT2

The following is a branching program computing fh(X + Y ). The branching program considers each term, one at atime, and keeps the running sum. It has a layer for each T1 where |T1| ≤ di/2 ≤ t/4 and then for each T2 where|T2| ≤ di/2 ≤ t/4. The layer starts with two states knowing the sum so far computing fh(X + Y ). The branchingprogram then branches k ways on the value of Y . For each of these branches depending on Y , over T2 such that

59

YT2 = 1 it has a sequence of f reading states computing the bit BT1(fW , Y ). These states then collapse down to fourstates, knowing the sum so far and knowing BT1(fW , Y ). The branching program then branches k ways on the valueof X . It then completes the layer by adding BT1(fW , Y ) ·XT1 into the sum. Since the number of terms involved is∑t/4i=0

(ti

)= k0.82 our branching program needs only six times as many leaf reading states (two reading Y and four

reading X).

The number of bits BT1(fW , Y ) =∑T2:|T1∪T2|=h fW (T1∪T2)YT2 that must be computed per hamming weight h is

k0.82 because there are as many terms . Within the computation of each term the branching program branches k wayson the value of Y and then knowing Y the bit BT1(fW , Y ) is obtained as a sum over the coordinates fW (T1∪T2), forthe appropriate values of z = T1∪T2 ∈ [k]. This allows BT1(fW , Y ) to be computed with at most only k reads to ffor each value of Y within each term, for a total of k2.82 per hamming weight h.

The branching program for f(X + Y, U + V ) first reads U and then V giving k states each knowing the value ofU + V , it then computes the hamming weight h = |X + Y | by alternately reading X and Y looking at each log k

times once per index and remembering the count of ones in X + Y . It then follows up with the above branchingprogram for fh(X + Y ). Thus overall the number of leaf reading states is k1.82 and f reading states k3.82.

For the reader wondering while contrasting with the obvious algorithm with O(k2) leaf queries and O(k2) functionqueries: note that this height 3 upperbound transfers to produce non-trivial upperbounds for a corresponding height 4

tree evaluation problem with at most O(k2.82) leaf qeries and O(k3.82) function queries as opposed to O(k3) andO(k4) respectively.

Corollary 46. There exists an instance of TEP of height 3 for which k-way BPs need at most k1.82 leaf queryingstates but boolean branching programs need at least k2 leaf bit querying states.

Proof. Follows from Theorem 43 and Theorem 45.

4.6.2 Generic Upper bounds

Observe that the above upper bound crucially exploits the fact that each bit zi in the output of bitwise additionZ = X + Y depends on only two bits from x and y resulting in a low degree representation as a polynomial in thevariables x and y to begin with. In general the level 2 functions could be such that each of zi can depend on manymore bits of x and y. In that case, the resulting polynomial can have double the degree(total), that is 2t to start with.We can construct a polynomial representation using dedicated characteristic polynomials for each input over 2t bitsthat is both bits from x and y. Then using separate polynomials for each hamming weight helps us half the totaldegree to at most t. Once again, we can group each individual monomial into two kinds based on whether x variablesare fewer or y variables are fewer. However this time around, even the side(one of x or y) with fewer variables canhave a degree upto t/2.

fi(X + Y ) =∑

|T1|≤di/2

∑|T1∪T2|≤di

fW (T1 ∪ T2)YT2

·XT1+

∑|T2|≤di/2

∑|T1∪T2|≤di

fW (T1 ∪ T2)XT1

· YT2.

Define the single bit BT1(fW , Y ) to be

∑|T1∪T2|≤di fW (T1 ∪ T2)YT2

and the single bit CT2(fW , X)

60

Figure 4.3: This figure illustrates a black pebbling of T 3 using 3 pebbles.

to be∑|T1∪T2|≤di fW (T1 ∪ T2)XT1

, giving

fi(X + Y ) =∑

|T1|≤di/2

BT1(fW , Y ) ·XT1

+∑

|T2|≤di/2

CT2(fW , X) · YT2

where di < t. As a result the number of terms involved is∑t/2i=0

(ti

)≈ k

2 . This changes the numbers to k2 leafquerying states and k4 function f reading states. Compare this with the naive read once upper bound based onpebbling for TEP 3

2 shown in Figure 4.3 where the branching program has k2 leaf querying states and k3 f readingstates. The described upper-bound reads inputs multiple times and moves between reading the function f and the leafvalues. Since the challenge of showing an Ω(k2) leaf lower bound for TEP 3

2 is on the way to showing a branchingprogram size lower bound of Ω(k4) for TEP 4

2 it becomes important to note that the number of function f readingstates in the above upper bound is k4. We expect that an understanding of such multi-read upper bounds and therelative numbers of leaf and function queries needed in a branching program can help us come up with a betterapproach for showing the desired lower bounds.

4.7 Acknowledgements

We thank Paul Beame, Siu Man Chan and David Liu for helpful discussions and anonymous readers for many helpfulsuggestions that have helped present the contents of all the chapters in the thesis in a better fashion.

61

Bibliography

[1] M. Ajtai. A non-linear time lower bound for boolean branching programs. In Proceedings 40th FOCS, pages60–70, 1999.

[2] Miklos Ajtai, Laszlo Babai, Peter Hajnal, Janos Komlos, and Pavel Pudlak. Two lower bounds for branchingprograms. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 30–38.ACM, 1986.

[3] Andris Ambainis and Satyanarayana V Lokam. Improved upper bounds on the simultaneous messagescomplexity of the generalized addressing function. In LATIN, pages 207–216. Springer, 2000.

[4] Laszlo Babai, Anna Gal, Peter G Kimmel, and Satyanarayana V Lokam. Communication complexity ofsimultaneous messages. SIAM Journal on Computing, 33(1):137–166, 2003.

[5] Laszlo Babai, Peter Hajnal, Endre Szemeredi, and Gyorgy Turan. A lower bound for read-once-only branchingprograms. Journal of Computer and System Sciences, 35(2):153–162, 1987.

[6] Laszlo Babai, Peter G Kimmel, and Satyanarayana V Lokam. Simultaneous messages vs. communication. InAnnual Symposium on Theoretical Aspects of Computer Science, pages 361–372. Springer, 1995.

[7] David A Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages innc 1. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 1–5. ACM, 1986.

[8] P. Beame, T.S. Jayram, and M. Saks. Time-space tradeoffs for branching programs. J. Comput. Syst. Sci,63(4):542–572, 2001.

[9] P. Beame, M. Saks, X. Sun, and E. Vee. Time-space trade-off lower bounds for randomized computation ofdecision problems. Journal of the ACM, 50(2):154–195, 2003.

[10] Paul Beame, Nathan Grosshans, Pierre McKenzie, and Luc Segoufin. Nondeterminism and an abstractformulation of neciporuk’s lower bound method. 9, 08 2016.

[11] Paul Beame and Pierre McKenzie. A note on neciporuks method for nondeterministic branching programs.Manuscript, August, 2011.

[12] Michael Ben-Or and Richard Cleve. Computing algebraic formulas using a constant number of registers. SIAM

Journal on Computing, 21(1):54–58, 1992.

62

[13] Allan Borodin, A Razborov, and Roman Smolensky. On lower bounds for read-k-times branching programs.Computational Complexity, 3(1):1–18, 1993.

[14] Stephane Boucheron, Gabor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory

of independence. Oxford university press, 2013.

[15] Siu On Chan, James R. Lee, Prasad Raghavendra, and David Steurer. Approximate constraint satisfactionrequires large LP relaxations. J. ACM, 63(4):34:1–34:22, 2016.

[16] Arkadev Chattopadhyay, Michal Koucky, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems viapseudorandom properties. CoRR, abs/1704.06807, 2017.

[17] Alan Cobham. The recognition problem for the set of perfect squares. In Switching and Automata Theory,

1966., IEEE Conference Record of Seventh Annual Symposium on, pages 78–87. IEEE, 1966.

[18] Stephen Cook, Jeff Edmonds, Venkatesh Medabalimi, and Toniann Pitassi. Lower Bounds for NondeterministicSemantic Read-Once Branching Programs. In 43rd International Colloquium on Automata, Languages, and

Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs),Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

[19] Stephen Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branchingprograms for tree evaluation. ACM Transactions on Computation Theory (TOCT), 3(2):4, 2012.

[20] Stephen Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branchingprograms for tree evaluation. ACM Transactions on Computation Theory (TOCT), 3(2):4, 2012.

[21] Stephen Cook and Ravi Sethi. Storage requirements for deterministic polynomialtime recognizable languages.Journal of Computer and System Sciences, 13(1):25–37, 1976.

[22] Susanna F. de Rezende, Jakob Nordstrom, and Marc Vinyals. How limited interaction hinders realcommunication (and what it means for proof and circuit complexity). In IEEE 57th Annual Symposium on

Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New

Jersey, USA, pages 295–304, 2016.

[23] Scott Diehl and Dieter Van Melkebeek. Time-space lower bounds for the polynomial-time hierarchy onrandomized machines. SIAM Journal on Computing, 36(3):563–594, 2006.

[24] Irit Dinur and Or Meir. Toward the krw composition conjecture: Cubic formula lower bounds viacommunication complexity. In LIPIcs-Leibniz International Proceedings in Informatics, volume 50. SchlossDagstuhl-Leibniz-Zentrum fuer Informatik, 2016.

[25] Jeff Edmonds, Russell Impagliazzo, Steven Rudich, and Jiri Sgall. Communication complexity towards lowerbounds on circuit depth. Computational Complexity, 10(3):210–246, 2001.

[26] L. Fortnow. Nondeterministic polynomial time versus nondeterministic logarithmic space: Time space tradeoffsfor satifiability. In Proceedings 12th Conference on Computational Complexity, pages 52–60, 1997.

63

[27] L. Fortnow and D. Van Melkebeek. Time-space tradeoffs for nondeterministic computation. In Proceedings

15th Conference on Computational Complexity, pages 2–13, 2000.

[28] Lance Fortnow, Richard Lipton, Dieter Van Melkebeek, and Anastasios Viglas. Time-space lower bounds forsatisfiability. Journal of the ACM (JACM), 52(6):835–865, 2005.

[29] Anna Gal. A simple function that requires exponential size read-once branching programs. Information

Processing Letters, 62(1):13–16, 1997.

[30] Dmitry Gavinsky, Or Meir, Omri Weinstein, and Avi Wigderson. Toward better formula lower bounds: aninformation complexity approach to the krw composition conjecture. In Proceedings of the 46th Annual ACM

Symposium on Theory of Computing, pages 213–222. ACM, 2014.

[31] Mika Goos. Lower bounds for clique vs. independent set. In IEEE 56th Annual Symposium on Foundations of

Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 1066–1076, 2015.

[32] Mika Goos, Shachar Lovett, Raghu Meka, Thomas Watson, and David Zuckerman. Rectangles are nonnegativejuntas. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015,

Portland, OR, USA, June 14-17, 2015, pages 257–266, 2015.

[33] Mika Goos, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. InVenkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS

2015, Berkeley, CA, USA, 17-20 October, 2015, pages 1077–1088. IEEE Computer Society, 2015.

[34] Mika Goos, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for BPP. In IEEE 57th

Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, pages 132–143.IEEE Computer Society, 2017.

[35] Neil Immerman. Nondeterministic space is closed under complementation. SIAM Journal on computing,17(5):935–938, 1988.

[36] S. Jukna. A nondeterministic space-time tradeoff for linear codes. Information Processing Letters,109(5):286–289, 2009.

[37] Stasys Jukna. A note on read-k times branching programs. Informatique theorique et applications,29(1):75–83, 1995.

[38] Stasys Jukna. Expanders and time-restricted branching programs. Theoretical Computer Science,409(3):471–476, 2008.

[39] Stasys Jukna. Boolean function complexity: advances and frontiers, volume 27. Springer Science & BusinessMedia, 2012.

[40] Stasys Jukna and A Razborov. Neither reading few bits twice nor reading illegally helps much. Discrete

Applied Mathematics, 85(3):223–238, 1998.

[41] Stasys P Jukna. The effect of null-chains on the complexity of contact schemes. In Fundamentals of

Computation Theory, pages 246–256. Springer, 1989.

64

[42] Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum incommunication complexity. computational complexity, 5(3):191–204, 1995.

[43] Ilan Komargodski, Ran Raz, and Avishay Tal. Improved average-case lower bounds for demorgan formula size.In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 588–597. IEEE,2013.

[44] Pravesh K. Kothari, Raghu Meka, and Prasad Raghavendra. Approximating rectangles by juntas andweakly-exponential lower bounds for LP relaxations of csps. In Proceedings of the 49th Annual ACM SIGACT

Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 590–603,2017.

[45] Matthias Krause, Christoph Meinel, and Stephan Waack. Separating the eraser turing machine classes le, nle,co-nle and pe. In Mathematical Foundations of Computer Science 1988, pages 405–413. Springer, 1988.

[46] Chang-Yeong Lee. Representation of switching circuits by binary-decision programs. Bell Labs Technical

Journal, 38(4):985–999, 1959.

[47] James R. Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programmingrelaxations. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC

2015, Portland, OR, USA, June 14-17, 2015, pages 567–576, 2015.

[48] R. Lipton and A. Viglas. Time-space tradeoffs for sat. In Proceedings 40th FOCS, pages 459–464, 1999.

[49] David Liu. Pebbling arguments for tree evaluation. CoRR, abs/1311.0293, 2013.

[50] William Joseph Masek. A fast algorithm for the string editing problem and decision graph complexity. PhDthesis, Massachusetts Institute of Technology, 1976.

[51] Edward I Nechiporuk. On a boolean function. Doklady Akademii Nauk SSSR, 169(4):765–+, 1966.

[52] EA Okolnishnikova. On lower bounds for branching programs. Siberian Advances in Mathematics,3(1):152–166, 1993.

[53] Stephen Ponzio. A lower bound for integer multiplication with read-once branching programs. SIAM Journal

on Computing, 28(3):798–815, 1998.

[54] Pavel Pudlak and Stanislav Zak. Space complexity of computations. Preprint Univ. of Prague, 1983.

[55] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Combinatorica, 19(3):403–435,1999.

[56] Janos Simon and Mario Szegedy. A new lower bound theorem for read-only-once branching programs and itsapplications. In Advances in Computational Complexity Theory, pages 183–194, 1990.

[57] Johan Hastad. The shrinkage exponent of de morgan formulas is 2. SIAM Journal on Computing, 27(1):48–64,1998.

65

[58] Robert Szelepcsenyi. The method of forced enumeration for nondeterministic automata. Acta Informatica,26(3):279–284, 1988.

[59] MA Taitslin. An example of a problem from ptime and not in nlogspace. Proceedings of Tver State University,6(12):5–22, 2005.

[60] Avishay Tal. Shrinkage of de morgan formulae by spectral techniques. In Foundations of Computer Science

(FOCS), 2014 IEEE 55th Annual Symposium on, pages 551–560. IEEE, 2014.

[61] Ingo Wegener. Optimal decision trees and one-time-only branching programs for symmetric boolean functions.Information and Control, 62(2-3):129–143, 1984.

[62] Ingo Wegener. On the complexity of branching programs and decision trees for clique functions. Journal of the

ACM (JACM), 35(2):461–471, 1988.

[63] Ryan Williams. Better time-space lower bounds for sat and related problems. In Computational Complexity,

2005. Proceedings. Twentieth Annual IEEE Conference on, pages 40–49. IEEE, 2005.

[64] Xiaodi Wu, Penghui Yao, and Henry Yuen. Raz-mckenzie simulation with the inner product gadget. Electronic

Colloquium on Computational Complexity (ECCC), 2017.

66

Documents

by Venkatesh Medabalimi - University of Toronto T-Space · 2018-08-23 · Venkatesh Medabalimi Doctor of Philosophy Department of Computer Science University of Toronto 2018 A longstanding