Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Revisiting Lower Bounds for Multi-r-ic Depth1
Four Circuits2
Suryajith Chillara3
IIT Bombay, India4
Christian Engels6
IIT Bombay, India7
Abstract9
Multi-r-ic arithmetic circuits are arithmetic circuits where the individual degree of every variable, in10
polynomials computed at every node, is restricted to be at most r. This is a natural generalization11
of the multilinear restriction.12
Raz and Yehudayoff (CC, 2008) introduced a "full rank polynomial" and proved strong lower13
bounds against the multilinear circuits (of small depth) and formulas computing it, using the method14
of partial derivatives (cf. Nisan-Wigderson (CC, 1997), Raz (ToC 2006)). It is a natural question to15
ask if these techniques and polynomial constructions be extended beyond the multilinear setting, to16
multi-r-ic circuit models and thus prove lower bounds against multi-r-ic circuits and in particular,17
depth four multi-r-ic circuits.18
In this paper, we prove a superpolynomial yet quasipolynomial lower bound on the size of depth19
four multi-r-ic circuits computing an explicit "full rank polynomial" by just using the method of20
partial derivatives.21
Recently Kayal, Saha and Tavenas (ToC 2018) proved an exponential lower bound on the size of22
depth four multi-r-ic circuits computing explicit polynomials using a stronger complexity measure.23
Our proof however borrows no elements from theirs and is for a completely different polynomial.24
The proof strategy is inspired by Saptharishi’s proof of an exponential lower bound for depth three25
multi-r-ic circuits (cf. Chapter 14, Ramprasad’s survey, 2019).26
The point of this paper is to retrospectively extend our understanding of the power of the method27
of Partial Derivatives and thus achieve superpolynomial lower bounds, and its limits to realize the28
need to use stronger complexity measures to prove exponential bounds.29
2012 ACM Subject Classification Theory of Computation → Algebraic Complexity Theory30
Keywords and phrases Algebraic Complexity, Lower Bounds31
Digital Object Identifier 10.4230/LIPIcs...32
1 Introduction33
One of the major focal points in the area of algebraic complexity theory is to show that34
certain polynomials are hard to compute syntactically. Here, the hardness of computation is35
quantified by the number of arithmetic operations that are needed to compute the target36
polynomial. Instead of the standard turing machine model, we consider arithmetic circuits37
and formulas as models of computation.38
Arithmetic circuits are directed acyclic graphs such that the leaf nodes are labeled by39
variables or constants from the underlying field, and every non-leaf node is labeled either by40
a + or ×. Every node computes a polynomial by operating on its inputs with the operand41
given by its label. The flow of computation flows from the leaf to the output node. We refer42
the readers to the standard resources [27, 26] for more information on arithmetic formulas43
and arithmetic circuits.44
© Suryajith Chillara and Christian Engels;licensed under Creative Commons License CC-BY
Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
XX:2 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
Valiant conjectured that the Permanent does not have polynomial sized arithmetic45
circuits [30]. Working towards that conjecture, we aim to prove superpolynomial circuit46
size lower bounds. However, the best known circuit size lower bound is Ω(n logn), for a47
power symmetric polynomial, due to Baur and Strassen [28, 3], and, the best known formula48
size lower bound is Ω(n2), due to Kalorkoti [15]. Due to the slow progress towards proving49
general circuit/formula lower bounds, it is natural to study some restricted class of arithmetic50
circuits and formulas.51
Since most of the polynomials of interest such as Determinant, Permanent, etc., are52
multilinear polynomials, it is natural to consider the restriction where every intermediate53
computation is in fact multilinear. Due to the phenomenal work in the last two decades [20,54
22, 21, 24, 25, 13, 23, 2, 7, 5, 6], the complexity of multilinear formulas and circuits is better55
understood than that of general formulas and circuits.56
Backed with this progress it is natural to try to extend these results to a circuit model57
where the individual degree of every variable in the polynomial computed at every node in58
the circuit is r. We refer to these circuits as multi-r-ic circuits. When r = 1, the circuit59
model is multilinear.60
Recently, Kumar, Oliviera and Saptharishi [18] showed that there is a chasm1 for61
multi-r-ic circuits too. They proved that any polynomial sized (say nc) multi-r-ic circuit of62
arbitrary depth computing a polynomial on n variables can be depth reduced to syntactical63
multi-r-ic depth four circuits of size exp(O(√n logn)). This provides us a motivation to64
study multi-r-ic depth four circuits and prove strong lower bounds against them. Towards65
this, Kayal, Saha and Tavenas [17] proved an exponential size lower bound against multi-r-ic66
depth four circuits computing a variant of the iterated matrix multiplication polynomial.67
They achieved this bound using a variant of the method Shifted Partial Derivatives [11]68
and the method of Skew Partial Derivatives [16] called the method of Shifted Skew Partial69
Derivatives.70
Motivation for this work and our results:71
For non-homogeneous multilinear circuit lower bounds (of small depth), Raz and Yehuday-72
off [24] introduced a "full rank polynomial" and proved lower bounds against the multilinear73
circuits of small depth and multilinear formulas computing it through the method of partial74
derivatives. It is a natural question to ask if such techniques can be extended and thus used75
against multi-r-ic circuits and in particular, depth four multi-r-ic circuits.76
We tread this line of thought and show that we can indeed prove super polynomial size77
lower bounds against depth four multi-r-ic circuits computing a "full rank polynomial" by78
just using the method of partial derivatives.79
The premise of our work is set in a hypothetical universe that is blissfully unaware of80
the result of Kayal, Saha and Tavenas [17] and then work towards proving lower bounds81
for multi-r-ic circuits. For this reason, our proof borrows no elements from [17] and is only82
based on tools and techniques developed prior to it.83
We first construct a "full rank polynomial" (based on the construction of Saptharishi [26,84
Chapter 14]) and then show that any depth four multi-r-ic circuit computing it must have a85
super polynomial size using the dimension of partial derivatives as a complexity measure.86
This can be formally stated as follows.87
1 Agrawal and Vinay [1], Koiran, and Tavenas [29] showed that any general circuit can be depth reducedto a depth four circuit of non-trivial size.
S. Chillara and C. Engels XX:3
I Theorem 1. Let n, r be positive integers such that r = o(√
logn). There exists an explicit88
polynomial on n-variables and degree O(n) such that any syntactically multi-r-ic depth four89
circuit computing it must be of size nΩ( lognr2 ).90
Comparison to related work:91
It is important to note that our lower bound is quasipolynomial in the number of variables92
whereas the bound of Kayal, Saha and Tavenas [17] is exponential. Our result deviates from93
that of [17] in the following ways:94
1. Our proof technique is entirely different from that of [17] and is of independent interest95
even in the presence of their quantitatively better bound.96
2. We prove a super polynomial size lower bound against depth four multi-r-ic circuits for a97
new multi-r-ic full rank polynomial.98
3. The polynomials considered by Kayal et al. have an implicit coding theoretic structure99
on a large set of their monomials making them amenable to the method of Shifted Skew100
Partial Derivatives. This structure is not so evident to us, in the full rank polynomial101
that we consider and we do not know if we can prove exponential bounds for the full rank102
polynomial using Shifted Skew Partial Derivatives.103
4. The polynomials considered by Kayal et al. are not "full rank polynomials" and just the104
method of partial derivatives for those polynomials does not yield much in this multi-r-ic105
setting when r is not too small.106
Proof overview:107
Analogous to the work of Fournier et al. [9] and [19], we first consider multi-r-ic depth four108
circuits of low bottom support2 and prove lower bounds against them. Let T1, T2, . . . , Ts be109
the terms corresponding to the product gates feeding into the output sum gate. The output110
polynomial is obtained by adding the terms T1, T2, . . . , Ts. Recall that each of these Ti’s is a111
product of low support polynomials Qi,j , that is, every monomial in these Qi,j ’s is supported112
on a small set of variables (say µ many).113
Let ρ : X 7→ Y t Z be a partition of variables X into two disjoint sets Y and Z such114
that each variable in X is either mapped to Y or Z with equal probability. Under such a115
partition, we can construct a matrix M(f |ρ) such that the rows are indexed by monomials116
in Y -variables and the columns are indexed by monomials in Z-variables. The entries of the117
matrix are defined by the coefficient of the monomial obtained by the product of its indices,118
in f |ρ (cf. Section 2).119
As in [22, 24], we show that under a random partition ρ, for all i, M(Ti|ρ) has far from120
full rank with a high probability.121
Let us first consider the following two extremal cases:122
1. Each of the Qi,j ’s is dependent on a small set of variables: Recall that rank of a matrix123
is sub-multiplicative (cf. Observation 2 Item 1) and thus we get that rank(M(Ti|ρ)) ≤124 ∏j rank(M(Qi,j |ρ)). We first observe that a factor Qi,j does not contribute non-trivially125
to the rank if all its variables are either set only to Y -variables or only to Z-variables.126
In such a case rank(M(Qi,j |ρ)) = 1. It is easy to show that this event happens with127
a decent probability. We call such a factor ineffective. In particular, when a factor128
becomes ineffective, all the variable appearances in that factor do not contribute to129
2 That is, all the product gates at the bottom are supported on small set of variables.
XX:4 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
the rank of M(Ti|ρ). We show that under a random partition, a good fraction of the130
variable appearances3 are rendered ineffective with high probability. Conditioned on this131
probability, we then measure the contribution of the remaining variable appearances132
to the matrix rank by estimating the number of possibly non-zero rows in the matrix133
M(Ti|ρ). We show that this is smaller than maximum possible rank. The proof of this134
case borrows ideas from Saptharishi’s proof [26, Chapter 14] of an exponential lower135
bound against depth three multi-r-ic circuits.136
2. Each of the Qi,j ’s is dependent on a large set of variables: Since each of the Qi,j ’s137
is dependent on a large set of variables (say τ many) and total number of variable138
appearances is bounded by nr, we can infer that there cannot be more than nr/τ many139
factors. Since Qi,j ’s are polynomials with bottom support bounded by at most ≤ µ,140
deg(Qi,j) is at most µr and thus the degree of Ti is at most µnr2/τ . For a carefully chosen141
parameter τ , this can be small. From this, it is now clear that the possibly non-zero142
rows of the matrix M(Ti|ρ) will be indexed by monomials of degree at most µnr2/τ in Y143
variables.144
An arbitrary term Ti under consideration could have both types of factors, that is factors145
with low variable support and factors with large variable support. To deal with this, we can146
break our term Ti into two terms Ti,1 and Ti,2 such that T1 = Ti,1 ·Ti,2. Here, Ti,1 has factors147
of small support and Ti,2 has factors of large support. We then obtain rank of M(Ti,1|ρ) and148
M(Ti,2|ρ) separately using the aforementioned arguments and then obtain an upper bound149
rank of M(Ti|ρ) by multiplying ranks of M(Ti,1|ρ) and M(Ti,2|ρ) (Item 1 of Observation 2).150
Once we obtain the rank ofM(Ti|ρ), using sub-additivity of rank (Item 2 of Observation 2)151
and an union bound, we then show that the polynomial computed by a "small sized" depth four152
multi-r-ic circuit of low bottom support, has less than full rank for a random partition, with153
a non-zero probability. On the other hand, we show that an explicit candidate polynomial Fn154
has full rank under every partition of the variables. Hence, a depth four multi-r-ic circuit of155
low bottom support, of small size, cannot compute Fn. This yields us a lower bound against156
multi-r-ic depth four circuits of low bottom support.157
Using standard techniques (cf. [26, Chapter 20]), we can then lift our lower bound against158
depth four multi-r-ic circuits of low bottom support to general depth four multi-r-ic circuits.159
2 Preliminaries160
Depth four circuits:161
A depth four circuit (denoted by ΣΠΣΠ) over a field F and variables x1, x2, . . . , xn computes162
polynomials which can be expressed in the form of sums of products of polynomials. That163
is,s∑i=1
di∏j=1
Qi,j(x1, . . . , xn) for some di’s. A depth four circuit of bottom support t (denoted164
by ΣΠΣΠt) is a depth four circuit where all the monomials in every polynomial Qi,j are165
supported on at most t variables.166
Multi-r-ic arithmetic circuits:167
We refer the readers to the standard resources [27, 26] for the definitions of arithmetic168
formulas and arithmetic circuits.169
3 Recall that any variable can appear at most r times among all the factors.
S. Chillara and C. Engels XX:5
I Definition 2 (multi-r-ic circuits). Let r = (r1, r2, · · · , rn).170
1. A polynomial f(x1, x2, · · · , xn) is said to be multi-r-ic if for all i ∈ [n], the individual171
degree of the variable xi in the polynomial computed at every node is at most ri. If172
r1 = r2 = · · · = rn = r, we simply refer to it as a multi-r-ic polynomial.173
2. An arithmetic circuit Φ is said to be a multi-r-ic circuit if the polynomial fu computed a174
node u ∈ Φ is a multi-r-ic polynomial for all u ∈ Φ.175
3. An arithmetic circuit Φ is said to be a syntactically multi-r-ic circuit if for all product176
gates u ∈ Φ and u = u1 × u2 × · · · × ut, the total degree with respect to every variable is177
bounded by r, i.e.,∑j∈[t] degxi(fuj ) ≤ r for all i ∈ [n].178
Complexity measure:179
In this article, we use the measure of the dimension of partial derivative space as the180
complexity measure to prove lower bounds (cf. [20, 22]). We follow the coefficient matrix181
perspective of Raz.182
Let ρ : X 7→ Y t Z be a partitioning function of X variables. Any monomial m in the183
polynomial f over the variables X, upon the application of the partitioning function can be184
expressed as a product of a monomial in Y variables with a monomial in Z variables.185
f(X) =∑i∈|M|
ci ·mi(X) 7→ f |ρ(Y, Z) =∑i∈|M|
ci · mi(Y ) · mi(Z)186
187
Let M |ρ(f |ρ) be a matrix whose rows are indexed by all the monomials in Y variables and188
the columns are indexed by all the monomials in Z variables. The entry indexed by the189
monomials (m(Y ), m(Z)) in M |ρ(f |ρ) is the coefficient of the monomial m(Y ) · m(Z) in f |ρ.190
The complexity of f under ρ is the rank of the matrix Mρ(f |ρ). In short, we shall denote it191
as Γρ(f |ρ).192
I Observation 1. For any multi-r-ic polynomial f(X) and a partitioning function ρ : X 7→193
Y t Z, Γρ(f |ρ) is at most (r + 1)min|Y |,|Z|.194
I Observation 2. Let f1 and f2 be two polynomials defined over the sets of variables X. Let195
ρ : X 7→ Y t Z be a partitioning function.196
1. Sub-multiplicativity: If f(X) = f1(X) · f2(X) then Γρ(f |ρ) ≤ Γρ(f |ρ) · Γρ(f |ρ).197
2. Sub-additivity: If f(X) = f1(X) + f2(X) then Γρ(f |ρ) ≤ Γρ(f |ρ) + Γρ(f |ρ).198
I Observation 3. Let f1 and f2 be two polynomials defined over the disjoint sets of variables199
X1 and X2. Let ρ1 : X1 7→ Y1 tZ1 and ρ2 : X2 7→ Y2 tZ2 be two partitioning functions such200
that Y1, Y2, Z1, Z2 are all distinct. Let f(X1 ∪X2) = f1(X1) · f2(X2) and ρ = ρ1 ⊕ ρ2. Then201
Mρ(f |ρ) = Mρ1(f1|ρ1)⊗Mρ2(f2|ρ2) and thus, Γρ(f |ρ) = Γρ1(f |ρ1) · Γρ2(f |ρ2).202
Useful probabilistic estimates:203
The following version of the Chernoff bound [4, 12] is from the book of Dubashi and204
Panconesi [8, Theorem 1.1].205
I Theorem 3 (Chernoff bound). Let W1, . . . ,Wn be independent 0, 1-valued random vari-206
ables and let W =∑i∈[n]Wi. Then we have the following.207
1. For any ε > 0,208
Pr [W > (1 + ε)E [W ]] ≤ exp(−ε2
3 E [W ]) and Pr [W < (1− ε)E [W ]] ≤ exp(−ε2
2 E [W ]).209
XX:6 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
2. For any t > 2eE [W ],210
Pr [W > t] ≤ 2−t.211
We also need the following "read-r" version of concentration bounds.212
I Theorem 4 ([10, 14]). Let X1, . . . , Xn be independent random variables. Let E1, . . . , Ek213
be boolean random variables that are functions of Xi such that each Xi influences at most214
r of the Ej’s. If Pr[Ei = 1] ≥ p, then for any ε > 0, we have215
Pr [E1 + · · ·+ Ek ≤ (p− ε)k] ≤ e−2ε2k/r.216
3 Low-support multi-r-ic depth 4 circuits217
In this section we will show that any polynomial that is computed by a syntactically multi-r-ic218
depth four circuit of low bottom support must have strictly less than full rank for at least219
one partition of the variables.220
I Theorem 5. Let n, r > 1, µ be positive integers. Let T be a syntactic multi-r-ic product of221
polynomials Q1Q2 · · ·Qt over the variables X = x1, x2, . . . , xn. Let ρ be a random partition222
of variables X into Y t Z with equal probability. Let m = min |Y | , |Z|. Then, Γρ(T |ρ) is223
at most (1 + r)m−n
240µr2+2 with a probability of at least 1− 2 exp(
n280µr2+1
).224
Proof. Since, ρ is a random partition of variables X into Y t Z with equal probability,225
Eρ [|Y |] = |X| /2 = n/2. Let nY denote |Y |. Without loss of generality, let us assume that Y226
is the smaller partition4. Using Chernoff bound (from Theorem 3), we get that nY ≥ n/4227
with a probability of at least 1− exp(− n
16). From here on, we shall condition the rest of the228
analysis on nY ≥ n/4 and use this error probability to union bound all the error probabilities229
at the end.230
Let us further express T as a product of T1 and T2 where231
T1 = Qi | |Supp(Qi)| < τ and T2 = Qi | |Supp(Qi)| ≥ τ232
for a parameter τ fixed5 to 10µnr2
nY. Let a = (a1, a2, . . . , an) be a n-length vector that233
represents the maximum degree of all the variables in T1 and b = (b1, b2, . . . , bn) be a234
n-length vector that represents the maximum degree of all the variables in T2. Since the235
circuit C is multi-r-ic , we get that a + b ≤ r. Further, for all i ∈ [n], ai + bi ≤ r.236
From sub-multiplicativity of rank, we get that Γρ(T |ρ) ≤ Γρ(T1|ρ) · Γρ(T2|ρ). We shall237
now compute Γρ(T1|ρ) and Γρ(T2|ρ) independently and thus obtain a suitable upper bound238
on Γρ(T |ρ).239
Computing Γρ(T1|ρ):240
From the definition, it is clear that the support of all the polynomials in T1 is at most τ .241
With some reordering of indexes, we can consider T1 to be Q1Q2 · · ·Qt1 . For all i ∈ [t1], let242
the polynomial Qi be defined over a variable sets Xi. Let w1, w2, . . . , w` be an enumeration243
of all the appearances of the X variables in T1 with repetition.244
4 If not, interchange the labels of the sets Y and Z.5 Note that technically, τ is a random variable as nY is a random variable.
S. Chillara and C. Engels XX:7
We call a factor Qi (i ∈ [t1]) ineffective under ρ if all the variables in the support of Qi245
are either all mapped to Y variables or to Z variables. We call a variable wj ineffective under246
ρ if it appears in a factor and if it occurs in an ineffective factor. Let Ej be a 0, 1-random247
variable such that Ej = 1 if wj is ineffective and 0 otherwise.248
Prρ [Ej = 1] = Prρ [Qj is ineffective under ρ] = 12τ−1 .249
Notice, however, that Ei, Ej are not independent.250
Let us now count the total number of ineffective appearances of Y variables. Note that251
Ej ’s are at most r-wise dependent. This is due to the fact that any variable xi appears in at252
most r many Qj ’s and thus under ρ, it can influence at most r-many Ej ’s. Let S be a subset253
of [`] such that j ∈ S iff the X variable corresponding to wj maps to a Y variable under ρ.254
Let rS =∑i:ρ(xi)∈Y ai. Using Theorem 4 with the value of ε set to 2−τ , we get that255
Prρ
∑j∈S
Ej ≤rS2τ
≤ exp(− rSr · 22τ−1
).256
With probability of(1− exp
(− rSr·22τ−1
)), at most rS
(1− 1
2τ)many variable appear-257
ances are not ineffective. Let d be a vector of length n that represents the effective258
variable appearances. From the aforementioned arguments, we get that with high probability259 ∑i:ρ(xi)∈Y di ≤ rS
(1− 1
2τ). The number of possible non-zero rows in MY,Z(T1) is at most260 ∏
i:ρ(xi)∈Y (1 + di). Remember that nY = |Y |.261
Γρ(T1|ρ) ≤∏
i:ρ(xi)∈Y
(1 + di)262
≤
(nY +
∑i:ρ(xi)∈Y di
nY
)nY(Using AM-GM inequality)263
≤
(1 +
rS(1− 1
2τ)
nY
)nY264
≤ (1 + r)rSr (1− 1
2τ ) . (Using (1 + ux)n ≤ (1 + x)un)265266
Rephrasing this, we can say that Γρ(T1|ρ) is at most (1 + r)rSr (1−ν) where ν is at least267
1/2τ with a probability of(1− exp
(− rSr·22τ−1
)).268
Computing Γρ(T2|ρ):269
Under suitable renaming of polynomials, let T2 be the product R1R2 · · ·Rt2 . Let dL be equal270
to∑i∈[n] bi, rL be equal to
∑i:ρ(xi)∈Y bi. Note that dL ≤ nr.271
Trivially, the number of possible non-zero rows inMY,Z(T2|ρ) is at most∏i:ρ(xi)∈Y (1+bi).272
Γρ(T2|ρ) ≤∏
i:ρ(xi)∈Y
(1 + bi) ≤(nY +
∑i:ρ(xi)∈Y bi
nY
)nY≤ (1 + r)
rLr .273
274
We shall now show that this bound can be improved if there is a significant chunk of Y275
variables that appear in T2. From the definition, we know that for all i ∈ [t2], the polynomial276
XX:8 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
Ri has variable support of at least τ and every monomial depends only on µ variables6. This277
directly leads us to the following observation.278
I Observation 4. For all i ∈ [t2], degree of the polynomial Ri is at most µr. Thus, the279
degree of T2 is at most µrt2.280
Note that dL counts the aggregate of the individual degrees of all the variables in T2 and the281
support size of each Ri is at least τ . From this, we can infer that there cannot be too many282
factors.283
I Observation 5. The total number of factors in T2 is at most dL/τ ≤ nr/τ . Thus, degree284
of T2 is at most µnr2
τ .285
Thus, the number of potential non-zero rows in MY,Z(T2|ρ) will be indexed by all286
monomials in Y of degree at most µrt2 ≤ µnr2/τ . since τ = 10µnr2
nY, we have that nY ≥287
µnr2
τ = nY10 .288
Γρ(T2|ρ) ≤(nY + µnr2
τ
nY
)289
=(nY + µnr2
τµnr2
τ
)290
≤
(nY + µnr2
τµnr2
τ
)µnr2τ
· eµnr2τ (Using
(n
k
)≤ (en/k)k for k ≤ n/2)291
=(
1 + nY τ
µnr2
)µnr2τ
· eµnr2τ292
≤ (1 + r)nYr + µnr2
τ ln(1+r) . (Using (1 + ux)n ≤ (1 + x)un)293294
We will now bound Γρ(T2|ρ) by (1 + r)rLr (1−η) where η = max
0, 1− r
rL
(nYr + µnr2
τ ln(1+r)
).295
Computing Γρ(T |ρ):296
From the sub-multiplicativity of the measure, we know that Γρ(T |ρ) is at most Γρ(T1|ρ) ·297
Γρ(T2|ρ).298
Γρ(T |ρ) ≤ (1 + r)rSr (1−ν) · (1 + r)
rLr (1−η)
299
= (1 + r)(rL+rS
r − rLηr −rSν
r )300
≤ (1 + r)(nY + rL+rSr −(nY + rLη
r + rSν
r ))301
= (1 + r)nY −∆302303
where ∆ =(nY + rLη
r + rSνr
)− rL+rS
r . Note that the maximum value Γρ(T |ρ) can attain is304
(1 + r)nY and ∆ is a measure of rank deficiency under ρ.305
Recall that rL + rS ≤ nY r and hence ∆ ≥ 0. Let rL ≤ δnY r and rS = (1 − δ)nY r306
for some δ = δ(τ) ∈ [0, 1]. Note that the value of δ gets fixed given a summand T and a307
threshold parameter τ .308
6 Recall that each of the polynomials R1, R2, . . . , Rt2 are such that every monomial has a support of atmost µ.
S. Chillara and C. Engels XX:9
Case when δ ≤ 0.75: Since δ is small, the contribution from T2 towards ∆ might be small309
or nothing at all. In this case, we shall argue that ∆ can be suitably lower bounded by just310
using the contributions from T1 with high probability.311
∆ =(nY + rLη
r+ rSν
r
)− rL + rS
r312
≥ nY −rLr− rS
r(1− ν) (Since η ≥ 0)313
≥ nY − δnY − (1− δ)nY (1− ν) (Since rL ≤ δnY r and rS = (1− δ)nY r)314
= nY (1− δ)ν.315316
With a probability of at least(1− exp
(− rSr·22τ−1
))≥(1− exp
(− nY
22τ+1
)), ν is at least317
12τ . Conditioned on this probability, we get that ∆ ≥ nY
2τ+2 .318
Case when δ > 0.75: Since δ is not too small, the contribution from T2 towards ∆ will319
start getting significantly larger. In this case, we shall argue that ∆ can be suitably lower320
bounded by just using the contributions from T2. By ignoring the contributions from T1 to321
∆, we can get rid of probabilistic arguments in this case.322
Firstly, we will argue that if rL+rSr is much smaller than nY , then ∆ is non-trivially lower323
bounded. Let us suppose that rL+rSr ≤ (1− ε)nY for a constant ε fixed to a value of 0.1. In324
such a case,325
∆ =(nY + rLη
r+ rSν
r
)− rL + rS
r≥(nY −
rL + rSr
)≥ εnY = nY
10 .326
327
Hereafter, we can assume that rL+rSr > (1− ε)nY and thus, rL ≥ (δ − ε)nY r.328
∆ =(nY + rLη
r+ rSν
r
)− rL + rS
r329
≥(nY −
rS + rLr
)+ rL
r− nY
r− µnr2
τ ln(1 + r)330
≥ (δ − ε)nY −nYr− µnr2
τ ln(1 + r) (Since nY ≥rS + rL
r)331
≥(δ − ε− 1
r
)nY −
nY10 ln(1 + r) (Since τ = 10µnr2/nY )332
≥(δ − ε− 1
r− 1
10 ln(1 + r)
)nY333
≥ (δ − 0.7)nY (Since ∀ r > 1, 1r
+ 110 ln(1 + r) < 0.6)334
≥ 0.05nY .335336
Putting it all together:337
From the above mentioned case analysis, for large values of τ we can infer that ∆ ≥338
min
0.1nY , 0.05nY , nY2τ+2
≥ nY
2τ+2 with a probability of at least 1 − exp(− nY
22τ+1
). Recall339
that nY ≥ n4 with a probability of 1− exp
(− n
16). Conditioned on this, τ is at most 40µr2.340
Thus, ∆ ≥ n240µr2+2 with a probability of at least 1 −
(exp
(− n
16)
+ exp(− n
280µr2+1
))≥341 (
1− 2 exp(− n
280µr2+1
)).342
XX:10 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
This gives us the theorem statement that Γρ(T |ρ) is at most (1 + r)min|Y |,|Z|− m
240µr2+2343
with a probability of at least 1− 2 exp(− n
280µr2+1
). J344
I Theorem 6. Let n, r, µ be positive integers. Let C be a syntactically multi-r-ic depth four345
ΣΠΣΠµ circuit of size s < exp(
n280µr2+2
)that computes a multi-r-ic polynomial f over346
the variables X = x1, x2, · · · , xn. Then there exists a random partition ρ of variables X347
into Y t Z with equal probability such that Γρ(f |ρ) is strictly less than (1 + r)min|Y |,|Z|.348
Proof. Recall that f is a sum of at most s many products of polynomials T (1) + · · · +349
T (s). Using sub-additivity of measure, we get that Γρ(f |ρ) ≤ s · maxi∈[s] Γρ(T (i)|ρ) . Let350
m = min |Y | , |Z|. From Theorem 5, we get that for a term T (i), Γρ(T (i)|ρ) is at most351
(1 + r)m−n
240µr2+2 with a probability of at least 1− 2 exp(− n
280µr2+1
). Taking a union bound352
over all the T (i)’s, we get that for all i ∈ [s], Γρ(T (i)|ρ) is is at most (1 + r)m−n
240µr2+2 with353
a probability of at least p = 1− 2s exp(− n
280µr2+1
). Since s < exp
(n
280µr2+2
), p is strictly354
greater than zero.355
Thus with a non-zero probability, we get that Γρ(f |ρ) is strictly less than (1 + r)m.356
Γρ(f |ρ) ≤ s ·maxi∈[s]
Γρ(T (i)|ρ)
357
≤ s · (1 + r)m−n
240µr2+2358
< (1 + r)m− n
240µr2+2+ n
280µr2+2 log(1+r)359
< (1 + r)m .360361
J362
4 Multi-r-ic depth 4 circuits363
Before we describe the proof of the lower bound, we shall introduce our polynomial family. To364
begin with, we consider a polynomial which is a slight generalization of a full rank polynomial365
of Raz and Yehudayoff [24]7. Using this polynomial, we construct the hard polynomial F by366
composing the full rank polynomial with linear forms akin to [26].367
The full rank polynomial Fn is defined over the field368
F(wa,b,c, wa,b : a, b, c ≤ 2n)[x1, x2, . . . , xn]. In short, we denote it as F(W )[X].369
Notice, that wa,b,c and wa,b are some auxiliary variables here. The full rank polynomial370
Fn = f1,2n is defined recursively as follows.371
fi,j =
r∑k=0
xki xki+1 if j = i+ 1,(
r∑k=0
xki xkj
)· fi+1,j−1 · wi,j
+j−1∑`=i+1
fi,` · f`+1,j · wi,`,j
if j − i is odd,
0 otherwise.
372
7 This was first defined and was then used by Saptharishi to prove a multi-r-ic depth three lower bound [26]
S. Chillara and C. Engels XX:11
We can now define our hard polynomial Pn based on the definition of Fn. Let p = n−c373
for some constant c ∈ (0, 1) which we shall later fix. Let X = x1, . . . , x2n and X =374
x1,1, x1,2, . . . , x1,t, . . . , x2n,1, x2n,2, . . . , x2n,t be distinct variable sets, where t = 2n log (2n)p .375
Let Linp : X 7→ X be a linear map such that xi 7→∑tj=1 xi,j for all i ∈ [2n]. The polynomial376
Pn,p(X) be defined to be Fn Linp(X). That is,377
Pn,p = Fn(t∑
j=1x1,j ,
t∑j=1
x2,j , · · · ,t∑
j=1x2n,j) where t = 2n log (2n)
p.378
379
We will now recall the following useful trick due to Kumar and Saptharishi [26].380
I Lemma 7 (Lemma 20.5 [26]). Let σ be a random restriction on the variable set X that sets381
a variable to zero with a probability of (1− p) where p = n−c for some constant c ∈ (0, 1) .382
Then Fn is a projection of (Pn,p)|σ with a probability of at least (1− 2−n).383
We will now show that the matrix corresponding to the polynomial Fn under every384
partition that we consider is of full rank and hence it is a hard polynomial for the circuit385
model under consideration.386
I Lemma 8. The polynomial Fn = f1,2n defined above has the property that for every387
partition X = x1, . . . , x2n = Y t Z, we have388
Γρ(Fn|ρ) ≥ (r + 1)min(|Y |,|Z|).389
Further, this polynomial can be computed by a linear sized arithmetic circuit and hence is in390
VP.391
Proof. The proof is similar to Lemma 13.17 [26]. We use an induction over n, where 2n is392
the number of variables. For n = 1 we have the polynomial F1 =∑rk=0 x
k1x
k2 . It is clear that393
any partition of x1 and x2 will have rank of (r + 1)min(|Y |,|Z|). That is, Γρ(F1) = 1 if both394
the variables are on the same side, and is equal to r + 1 if there is an equi-partition.395
Let n > 1 and let ρ : X 7→ Y t Z be any partition. Under the partition ρ, the variables396
x1, x2n could either be to the same side or get mapped to different sets, amongst Y and Z.397
We handle these cases separately.398
Let Y = Y \ ρ(x1), ρ(x2n) and Z = Z \ ρ(x1), ρ(x2n).399
Case 1: x1, x2n are in different parts.400
For ease of argument we set w1,i,2n = 0 for all i and let w1,2n = 1. Then we see that our401
polynomial is given by∑kr=0 (xr1xr2n) f2,2n−1wi,j . By invoking the induction hypothesis,402
we get that f2,2n−1 has a rank of at least (r + 1)min (|Y |,|Z|). Since∑kr=0 (xr1xr2n) and403
f2,2n−1 are defined over disjoint sets of variables, Observation 3 leads us to the following.404
Γρ(k∑r=0
(xr1xr2n) f2,2n−1) = Γρ(k∑r=0
(xr1xr2n)) · Γρ(f2,2n−1)405
≥ (r + 1) · (r + 1)min (|Y |,|Z|)406
= (r + 1)min(|Y |,|Z|).407408
Case 2: x1, x2n are in the same part. We know that there exists some k such that [1, k]409
and [k + 1, 2n] have equal size. We set w1,2n to zero. We can use our induction410
hypothesis on f1,k and fk+1,2n. Let us define Y1,k = ρ(x1), ρ(x2), . . . , ρ(xk) ∩ Y411
and Z1,k = ρ(x1), ρ(x2), . . . , ρ(xk) ∩ Z. Similarly, we can define Yk+1,2n =412
XX:12 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
ρ(xk+1), ρ(xk+2), . . . , ρ(x2n) ∩ Y and Zk+1,2n = ρ(xk+1), ρ(xk+2), . . . , ρ(x2n) ∩ Z.413
By invoking the induction hypothesis, we get that Γρ(f1,k) ≥ (r + 1)min|Y1,k|,|Z1,k| and414
Γρ(fk+1,2n) ≥ (r + 1)min|Yk+1,2n|,|Zk+1,2n|. From the definition of the polynomial, f1,k415
and fk+1,2n are defined over disjoint sets of variables and thus using Observation 3 we416
get that417
Γρ(f1,2n) ≥ (r + 1)min|Y1,k|,|Z1,k| · (r + 1)min|Yk+1,2n|,|Zk+1,2n| ≥ (r + 1)min(|Y |,|Z|).418419
The last inequality follows from the fact that mina1, b1+mina2, b2 ≥ mina1+a2, b1+b2.420
It is easy to construct a polynomial sized circuit for Fn bottom-up by following the recursive421
definition and infer that it is in fact in VP. J422
For the moment, let us assume that the polynomial Fn/2 is being computed by a423
multi-r-ic depth four ΣΠΣΠµ circuit of size s < exp(
n280µr2+2
). Theorem 8 tells us that424
for all partitions ρ : X 7→ Y t Z that we consider, Γρ(Fn/2|ρ) is at least (1 + r)min|Y |,|Z|.425
On the other hand, we infer from Theorem 6 there is at least one partition ρ such that426
Γρ(Fn/2|ρ) is strictly less than (1 + r)min|Y |,|Z| and thus arriving at a contradiction to our427
assumption. This can formally be summarized as follows.428
I Theorem 9. Let n, r, µ be positive integers. Any syntactically multi-r-ic depth four429
ΣΠΣΠµ circuit that computes Fn (x1, x2, . . . , x2n) must be of size exp(
Ω(
n280µr2
)).430
To prove the main theorem, we also need the following lemma.431
I Lemma 10 (Analogous to Lemma 20.4 [26]). Let P be a multi-r-ic polynomial that is432
computed by a syntactically multi-r-ic depth 4 circuit C of size s ≤ nδµ for some δ > 0 .433
Let σ be a random restriction that sets each variable to zero independently with probability434
(1− n−2δ) . Then with probability at least (1− 1/s) the polynomial σ(P ) is computed by a435
multi-r-ic depth four circuit C ′ of bottom support at most µ and size s.436
Equipped with Theorem 10 and Theorem 9, we are now ready to prove our main theorem.437
I Theorem 11. Let n, r be positive integers and δ, c be some constants in (0, 1). Let the438
parameter p be equal to n−δ. If C be any syntactically multi-r-ic depth four circuit computing439
the polynomial Pn,p(X) as described above then it must be of size nΩ( lognr2 ).440
Proof. Let µ be a parameter that is set to γ lognr2 for a small constant γ. Let σ : X 7→ 0, ∗441
be a random restriction such that a variable is set to zero with a probability of (1− n−2δ),442
and is left untouched otherwise. Let C be a syntactically multi-r-ic depth four circuit of size443
s ≤ nδµ that computes Pn,p. Theorem 10 tells us that C ′ = σ(C) is a multi-r-ic depth four444
circuit of size s and bottom support at most µ. Hence, σ(Pn,p) has a multi-r-ic ΣΠΣΠµ445
size at most s. Since Fn is a p-projection of σ(Pn,p), Fn also has a multi-r-ic ΣΠΣΠµ of446
size at most s.447
On the other hand, from Theorem 9 we know that any multi-r-ic ΣΠΣΠµ circuit that448
computes Fn must be of size exp(
Ω(
n280µr2
)). Thus, it must be that exp
(Ω(
n280µr2
))≤449
s ≤ nδµ. We can choose γ to be a small enough constant such that the aforementioned450
expression is satisfied. Thus, s must at least be nΩ( lognr2 ). J451
References452
1 Manindra Agrawal and V. Vinay. Arithmetic circuits: A chasm at depth four. In proceedings453
of Foundations of Computer Science (FOCS), pages 67–75, 2008. doi:10.1109/FOCS.2008.32.454
S. Chillara and C. Engels XX:13
2 Noga Alon, Mrinal Kumar, and Ben Lee Volk. Unbalancing sets and an almost quadratic455
lower bound for syntactically multilinear arithmetic circuits. In CCC, volume 102 of LIPIcs,456
pages 11:1–11:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.457
3 Walter BAUR and Volker STRASSEN. The complexity of partial derivatives. Theoretical458
Computer Science, 22:317–330, 1983.459
4 Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on460
the sum of observations. The Annals of Mathematical Statistics, 23(4):493–507, 1952. URL:461
http://www.jstor.org/stable/2236576.462
5 Suryajith Chillara, Christian Engels, Nutan Limaye, and Srikanth Srinivasan. A near-optimal463
depth-hierarchy theorem for small-depth multilinear circuits. In FOCS, pages 934–945. IEEE464
Computer Society, 2018.465
6 Suryajith Chillara, Nutan Limaye, and Srikanth Srinivasan. A quadratic size-hierarchy theorem466
for small-depth multilinear formulas. In ICALP, volume 107 of LIPIcs, pages 36:1–36:13.467
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.468
7 Suryajith Chillara, Nutan Limaye, and Srikanth Srinivasan. Small-depth multilinear formula469
lower bounds for iterated matrix multiplication with applications. SIAM J. Comput., 48(1):70–470
92, 2019.471
8 Devdatt P. Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis472
of Randomized Algorithms. Cambridge University Press, 2009. URL: http://www.cambridge.473
org/gb/knowledge/isbn/item2327542/.474
9 Hervé Fournier, Nutan Limaye, Guillaume Malod, and Srikanth Srinivasan. Lower bounds for475
depth 4 formulas computing iterated matrix multiplication. In proceedings of Symposium on476
Theory of Computing (STOC), pages 128–135, 2014. URL: http://doi.acm.org/10.1145/477
2591796.2591824.478
10 Dmitry Gavinsky, Shachar Lovett, Michael E. Saks, and Srikanth Srinivasan. A tail bound479
for read-k families of functions. Random Struct. Algorithms, 47(1):99–108, 2015. URL:480
https://doi.org/10.1002/rsa.20532, doi:10.1002/rsa.20532.481
11 Ankit Gupta, Pritish Kamath, Neeraj Kayal, and Ramprasad Saptharishi. Approaching the482
chasm at depth four. Journal of the ACM (JACM), 61(6):33, 2014.483
12 Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal484
of the American Statistical Association, 58(301):13–30, 1963. URL: http://www.jstor.org/485
stable/2282952.486
13 Pavel Hrubeš and Amir Yehudayoff. Homogeneous formulas and symmetric polynomials.487
Computational Complexity, 20(3):559–578, 2011.488
14 Svante Janson. Poisson approximation for large deviations. Random Structures & Algorithms,489
1(2):221–229, 1990.490
15 KA Kalorkoti. A lower bound for the formula size of rational functions. SIAM Journal on491
Computing, 14(3):678–687, 1985.492
16 Neeraj Kayal, Vineet Nair, and Chandan Saha. Separation between read-once oblivious493
algebraic branching programs (ROABPs) and multilinear depth three circuits. In proceedings of494
Symposium on Theoretical Aspects of Computer Science (STACS), pages 46:1–46:15, 2016. URL:495
https://doi.org/10.4230/LIPIcs.STACS.2016.46, doi:10.4230/LIPIcs.STACS.2016.46.496
17 Neeraj Kayal, Chandan Saha, and Sébastien Tavenas. On the size of homogeneous and of depth-497
four formulas with low individual degree. Theory of Computing, 14(16):1–46, 2018. URL: http:498
//www.theoryofcomputing.org/articles/v014a016, doi:10.4086/toc.2018.v014a016.499
18 Mrinal Kumar, Rafael Mendes de Oliveira, and Ramprasad Saptharishi. Towards optimal500
depth reductions for syntactically multilinear circuits. Electronic Colloquium on Computational501
Complexity (ECCC), 26:19, 2019.502
19 Mrinal Kumar and Shubhangi Saraf. On the power of homogeneous depth 4 arithmetic circuits.503
In proceedings of Foundations of Computer Science (FOCS), 2014. doi:10.1109/FOCS.2014.504
46.505
XX:14 Revisiting Lower Bounds for Multi-r-ic Depth Four Circuits
20 Noam Nisan and Avi Wigderson. Lower bounds on arithmetic circuits via partial derivatives.506
Computational Complexity, 6(3):217–234, 1997. doi:10.1007/BF01294256.507
21 Ran Raz. Multilinear-NC2 6= multilinear-NC1. In proceedings of Foundations of Computer508
Science (FOCS), pages 344–351, 2004. URL: https://doi.org/10.1109/FOCS.2004.42, doi:509
10.1109/FOCS.2004.42.510
22 Ran Raz. Separation of multilinear circuit and formula size. Theory of Computing, 2(1):121–135,511
2006. doi:10.4086/toc.2006.v002a006.512
23 Ran Raz, Amir Shpilka, and Amir Yehudayoff. A lower bound for the size of syntactically513
multilinear arithmetic circuits. SIAM Journal of Computing, 38(4):1624–1647, 2008. doi:514
10.1137/070707932.515
24 Ran Raz and Amir Yehudayoff. Balancing syntactically multilinear arithmetic circuits. Com-516
putational Complexity, 17(4):515–535, 2008. doi:10.1007/s00037-008-0254-0.517
25 Ran Raz and Amir Yehudayoff. Lower bounds and separations for constant depth multilinear518
circuits. Computational Complexity, 18(2):171–207, 2009. doi:10.1007/s00037-009-0270-8.519
26 Ramprasad Saptharishi. A survey of lower bounds in arithmetic circuit complexity version520
8.0.4. Github survey, 2019. URL: https://github.com/dasarpmar/lowerbounds-survey/521
releases/.522
27 Amir Shpilka and Amir Yehudayoff. Arithmetic circuits: A survey of recent results and open523
questions. Foundations and Trends in Theoretical Computer Science, 5:207–388, March 2010.524
URL: http://dx.doi.org/10.1561/0400000039.525
28 Volker Strassen. Die berechnungskomplexität von elementarsymmetrischen funktionen und526
von interpolationskoeffizienten. Numerische Mathematik, 20(3):238–251, 1973.527
29 Sébastien Tavenas. Improved bounds for reduction to depth 4 and depth 3. Information and528
Computation, 240:2–11, 2015. doi:10.1016/j.ic.2014.09.004.529
30 L. G. Valiant. Completeness classes in algebra. In Proceedings of the Eleventh Annual ACM530
Symposium on Theory of Computing, STOC ’79, pages 249–261, New York, NY, USA, 1979.531
ACM. URL: http://doi.acm.org/10.1145/800135.804419, doi:10.1145/800135.804419.532