Propositional Proof Complexity - Razborov

8/18/2019 Propositional Proof Complexity - Razborov

1/60

Propositional Proof Complexity

Instructor: Alexander Razborov, University of [email protected]

Course Homepage:http://people.cs.uchicago.edu/ ~razborov/teaching/winter09.html

Winter, 2009 and Winter, 2014

Contents

1 Global Picture 2

2 Size-Width Tradeoff for Resolution 32.1 Resolution Proof System . . . . . . . . . . . . . . . . . . . . . 32.2 Width: Another Complexity Measure . . . . . . . . . . . . . 42.3 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Edge Expansion Gives a Width Lower Bound . . . . . . . . . 6

3 Bounded Arithmetic: Theory of Feasible Proofs 83.1 Axioms for Bounded Arithmetic . . . . . . . . . . . . . . . . 83.2 The Main Theorem of Bounded Arithmetic . . . . . . . . . . 10

3.2.1 Bootstrapping by Introducing New Symbols . . . . . . 113.2.2 Finite Sequences . . . . . . . . . . . . . . . . . . . . . 123.2.3 Hentzen Calculus . . . . . . . . . . . . . . . . . . . . . 13

3.3 RSUV isomorphism . . . . . . . . . . . . . . . . . . . . . . . . 153.4 The bounded arithmetic hierarchy . . . . . . . . . . . . . . . 163.5 Does this hierarchy collapse? . . . . . . . . . . . . . . . . . . 163.6 More witnessing theorems . . . . . . . . . . . . . . . . . . . . 183.7 Transition to propositional logic . . . . . . . . . . . . . . . . . 20

4 Interpolation and Automatizability 204.1 Feasible Interpolation . . . . . . . . . . . . . . . . . . . . . . 204.2 Automatizability . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Resolution is Probably not Automatizable . . . . . . . . . . . 25

4.3.1 Constructing the contradiction . . . . . . . . . . . . . 28

1


2/60

4.3.2 The Upper Bound on sR . . . . . . . . . . . . . . . . . 31

4.3.3 The Lower Bound on sR . . . . . . . . . . . . . . . . . 324.3.4 Combinatorial Voodoo . . . . . . . . . . . . . . . . . . 354.3.5 The Upper Bound when k(C ) is too large . . . . . . . 36

5 Applications to Learning 365.1 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Concrete Lower Bounds 396.1 Random 3-CNFs . . . . . . . . . . . . . . . . . . . . . . . . . 396.2 The Pigeon-Hole Principle . . . . . . . . . . . . . . . . . . . . 39

6.2.1 Matching Restrictions . . . . . . . . . . . . . . . . . . 406.2.2 Matching Decision Trees . . . . . . . . . . . . . . . . . 416.2.3 Putting it all together . . . . . . . . . . . . . . . . . . 41

7 Algebraic and Geometric proof systems 437.1 Algebraic proof systems . . . . . . . . . . . . . . . . . . . . . 43

7.1.1 Nullenstellensatz proof systems . . . . . . . . . . . . . 437.1.2 Polynomial Calculus . . . . . . . . . . . . . . . . . . . 447.1.3 Algebraic calculus and logic . . . . . . . . . . . . . . . 48

7.2 Geometric proof systems . . . . . . . . . . . . . . . . . . . . . 487.2.1 Cutting planes proof system . . . . . . . . . . . . . . . 49

A Added in 2014 51A.1 Proofs of Resolution Lower Bounds based on Restrictions . . 51A.2 Feasible Interpolation for the Lov´ asz-Schrijver Proof System . 53A.3 Elements of Space Complexity . . . . . . . . . . . . . . . . . 57

Lecture 1Scribe: Raghav Kulkarni, University of Chicago.

Date: January 6, 2009

1 Global Picture

Question: Intuitively, what makes a proof easy or hard ?a) Length of the proof (e.g., 5 pages vs 10 pages)b) Logical Complexity (e.g., number of quantiers used)c) Conceptual Complexity (e.g., complexity of the mathematical conceptsused).

2


3/60

Types of Proof:

1. Proof that uses Quantiers (e.g., rst-order theories)2. Quantier-free proof (e.g., propositional proof).

In the rst part of this course we will briey study the proofs of type 1but the majority of this course is intended to study the proofs of type 2. Wewill present one particular result in the course which will bridge the part 1with part 2. (This result is roughly of the avor that “the circuits and theTuring machines are exactly the same in some precise sense.”)

Outreach:a) Lovasz-Schrijver rank : This notion of rank is associated with a positive

semidenite relaxation of integer linear programs. One consecutively shrinksthe polytope by applying some rules. The rank is roughly the number of such shrinks.b) Applications in Learning.

We begin with a “movie trailer” of the second part.

2 Size-Width Tradeoff for Resolution

Let x1, . . . , x n be propositional variables , i.e., x i ∈ {0, 1}.A literal is either xi or ¬x i (sometimes we write ¬x as x̄).Let ∈ {0, 1}. xi = xi if = 1 otherwise ¬x i .A clause C is a disjunction of literals: x 1i1 ∨. . .∨x wiw .The empty clause is denoted by 0 .

A conjunctive normal form (CNF) formula τ is a conjunction of clauses:C 1∧. . .∧C m . A CNF-formula is unsatisable if no assignment of its variablesevaluates to 1 (1 = True). For example, the empty clause 0 is unsatisable.An unsatisable formula is also called a contradition .

2.1 Resolution Proof System

Question: How does one prove that a CNF-formula τ is unsatisable?We prove that τ is unsatisable by deducing the empty clause 0 from itsclauses by a series of rules of the following two types:

Resolution Rule:C ∨x D∨ x̄

C ∨D Weakening:

C C ∨D

.

3


4/60

Note that one can arrange the intermediate formulae either in a directed

acyclic graph (DAG) or unfold the DAG to form a tree-like proof system.

(Soundness Theorem) If one can deduce 0 from τ by applying the aboverules, then τ is unsatisable.(Completeness Theorem) If τ is unsatisable then τ 0, i.e., one candeduce 0 from τ. (Homework 1.)Let P be a resolution(refutation) of τ. We denote by |P | the total num-ber of clauses in P.S R (τ ) = min {|P | : P is a refutation of τ }.Question: Are there short CNF-formulae which require large refutation ?Historical background:Tseitsin [68] - answered the question for regular resolutions.Haken [85] - constructed τ such that all of its refutations are exponential.Chvatal-Szemeredi [88] - random CNF is almost surely unsatisable and hasonly exponentially long refutation proofs.The above results are extremely nice and important but somewhat ad hoc.

2.2 Width: Another Complexity Measure

The width of a clause: w(x 11 ∨. . .∨x wiw ) = w.w(τ ) = max {w(C ) : C ∈ τ }.w(P ) = max {w(C ) : C ∈ P }.wR (τ ) = min {w(P ) : P − refutation of τ }.Question: What is the relation between w(τ ) and wR (τ ) ?If we use all clauses in τ then wR (τ ) ≥ w(τ ) but it is not true in general.One might not use all clauses of τ for its refutation, one might be able todeduce 0 from a subset of its clauses.Question: What is the relation between S R (τ ) and wR (τ ) ?It is easy to see that S R (τ ) ≤ (2n)wR (τ ) .Question: Does the above inequality hold in the other direction ?Unfortunately, wR (τ ) ≤ logn S R (τ ) is not true exactly, but the followingtheorem is known.

Theorem 1 (Ben-Sasson Wigderson 99) . For any n-variable contradictionτ, wR (τ ) = O( n log S R (τ )) .If τ is a contradiction and wR (τ ) = Ω( n) then S R (τ ) = exp(Ω( n)) .

4


5/60

Remark: Bonet, Galesi [99] - the bound in Theorem 1 is almost tight.

In particular, for every n, there is an n-variable contradiction τ n such thatS R (τ n ) = nO(1) but wR (τ n ) = Ω(√ n).2.3 Restrictions

For a clause C, its restriction to x = is dened as follows:C |x= = C if x /∈ C ; 1 if C = D∨x ; D if D = D∨x1− .If τ = C 1 ∧. . .∧C m , then τ |x= = C 1|x= ∧. . .∧C m |x= .Exercise: If τ is a contradiction then so is τ |x= . Moreover, the restrictionof a refutation (of τ ) is again a refutation (of τ |x= ).

Lemma 2.1 (Restriction is simpler to refute) . wR (τ |x= ) ≤ wR (τ ).

It is interesting that one can prove the following lemma in the other direc-tion.

Lemma 2.2. wR (τ ) ≤ max{wR (τ |x=0 ), wR (τ |x=1 )}+ 1 .

Exercise: Prove Lemma 2.1 and Lemma 2.2.

Next time, we will show that a slightly stronger version of the above lemmaholds. In particular, we will prove that:if wR (τ |x= ) ≤ w −1 and wR (τ |x=1 − ) ≤ w, then wR (τ ) ≤ w.We will see in the next lecture that this crucial inequality leads to the proof of Theorem 1.

Lecture 2Scribe: Raghav Kulkarni, University of Chicago.


Lemma 2.3. If wR (τ |x= ) ≤ w−1 and wR (τ |x=1 − ) ≤ w, then wR (τ ) ≤ w.Proof: The idea is to do the refutations sequentially. First deduce x1−

from τ using clauses of width at most w. (Follow closely the deduction from

5


6/60

τ |x= to 0, which uses clauses of length at most w −1. Add x1− to everyclause in such a deduction.) Now, from x

1

− and τ , deduce τ |x=1 − by usingthe resolution rule. This step does not introduce any clause of width morethan w. Finally, deduce 0 from τ |x=1 − using clauses of width at most w.Fat Clause: C is called fat if w(C ) > d. (d is a parameter to be speciedlater)

Lemma 2.4. If P is a refutation of τ that contains less than ab fat clauses then wR (τ ) ≤ d + b where a = (1 − d2n )−1.

Proof: We do induction on b and n.Base case: b = 0 ⇒ no fat clauses ⇒ wR (τ ) ≤ d.Let C 1, C 2, . . . , C k be all the fat clauses in P. Note that, out of the 2 n lit-erals, there exists x such that x is contained in d2n ×k clauses. Withoutloss of generality, let x = x be such a literal. By induction on b we havewR (τ |x=1 ) ≤ d + b−1 and by induction on n we have wR (τ |x=0 ) = d + b.From Lemma 2.3, R (τ ) ≤ d + b. Now we begin the proof of Theorem 1. Proof: Let ab = S R (τ )

⇒ b = log S R (τ )

log a = O(nd log S R (τ ))

⇒ wR (τ ) = O(d + nd log S R (τ )) .

Choose d = n log S R (τ ) to minimize the RHS. Corollary: wR (τ ) = Ω( n) ⇒ S R (τ ) = exp(Ω( n)) .Applications of Theorem 1:Usually it is easier to prove a lower bound on width than on size. FromTheorem 1, one can deduce a lower bound on the size from a lower boundon the width. The above theorem is used in proving lower bounds for thesize of the refutations of the following:a) Pigeon Hole Principleb) Random 3-CNFc) Tseitin tautologies (introduced in 1968).

For an alternate way of converting width lower bounds into size lowerbounds see Appendix A.1.

2.4 Edge Expansion Gives a Width Lower Bound

Let G be a xed graph on n vertices. Let {xe | e ∈ E (G)} be the set of variables. Let f : V (G) → {0, 1} be such that ⊕v∈V (G) f (v) ≡ 1 mod 2. Let6


7/60

τ (G, f ) ≡ (∀v ∈ V (G))(⊕e vxe = f (v)) . Suppose we want to refute τ (G, f ).Let d = max {degG (v) | v ∈ V (G)}. It is easy to see that τ (G, f ) is a d-CNFwith at most dn2 variables and at most n ×2d−1 clauses.Edge Expansion: e(G) = min n

3 ≤|V |≤2n3 e(V, V (G)\V ).Theorem 2 (Ben-Sasson and Wigderson 99) . For any connected G, wR (τ (G, f )) ≥e(G).

Axioms:

⊕v∈V (G) f (v) ≡ 1 mod 2.Av : ⊕e∈vxe ≡ f (v) mod 2.

A Measure on Clauses:For a clause C, µ(C ) = min {|V | : {Av | v ∈ V } |= C }.Exercise: If C ∈ τ (G, f ) then µ(C ) ≤ 1.Exercise: If G is connected then µ(0) = n.

Exercise: (µ(C ∨x) ≤ a&µ(D∨ x̄) ≤ b) ⇒ µ(C ∨D) ≤ a + b.Exercise: (Intermediate Clause:) (∃C )(

n3 ≤ µ(C ) ≤ 2n3 ).

Choose any intermediate C . Let V be a minimal set such that {Av | v ∈V } |= C . Since C is inetrmediate, we have: n3 ≤ |V | ≤ 2n3 . Let e = ( v , v) ∈(V , V \V ).Claim: xe appears in C.Proof of Claim: By the minimality of V , there exists an assignment {X e |e ∈E (G)}which satises all the axioms in V except Av and falsies C . Supposefor the sake of contradiction that xe /∈ C. Note that ipping X e satises allthe axioms of V and thus satises C. This is a contradiction because thenew assignment still agrees with the old assignment for all variables of C,whereas the old assignment does not satisfy C. Thus, w(C ) ≥ e(G), which completes the proof of Theorem 2. We note that there exist explicit connected graphs G with d = 3 ande(G) ≥ Ω(n). For any such graph, τ (G, f ) is exponentially hard to refute inResolution.

7


8/60

3 Bounded Arithmetic: Theory of Feasible Proofs

In the next two lectures, we will do the witnessing theorems connecting thefollowing two:a) Logical Complexityb) Conceptual ComplexityFor the purpose of this course, Conceptual Complexity ≡ ComputationalComplexity, i.e., intuitively, the concepts that are easy to compute are sim-ple. Moreover, most often easy to compute means poly-time computable.

Example: (∀x)(1 < x < p ⇒ x p) ⇒ (∀y)(1 < y < p ⇒ y p−1 ≡ 1 mod p).In the above statement, all the concepts are poly-time computable. For ex-ample, y p−1 can be computed in poly-time by iterated squaring. Thus theabove statement is feasible.On the other hand, the standard proof of already ordy( p) = p −1 is strik-ingly not feasible. We will see later on that, unless FACTORING is easy,Fermat’s Little Theorem does not possess any feasible proof in the strictsense we are dening.

First Order Theory: A rst order theory contains the following:

¬,∧,∨,⇒, (∀x), (∃x) and arithmetic over integers.Language: The language of bounded arithmetic consists of the following:S, 0, + , . , |x|, x/ 2 , # , ≤, where S is the successor function, i.e., S (x) =x + 1 ,

|x

| = “bit-size of x” and #( x, y) = 2 |x|·|y|.

Lecture 3Scribe: Joshua A. Grochow, University of Chicago.


In these two lectures we continue with bounded arithmetic. We start bynoting that without the # operation, if t is any term, then t(x1, . . . , x n ) <(x1 + ·· · + xn )O(1) , corresponding to linear time. So in order to capturepolynomial time, we need some way to increase bit-sizes faster, which isexactly what # allows us to do. Note that |x# y| = |x||y|; with #, we get|t(x1, . . . , x n )| ≤ (|x1| + · · ·+ |xn |)O(1) , corresponding to polynomial time,as desired.3.1 Axioms for Bounded Arithmetic

All bounded arithmetics include the axioms BASIC 2: these are 32 axiomsthat describe the properties of the logical symbols S , 0, +, ·, |x|, x/ 2 , #,

8


9/60

and ≤. Note that these axioms are all quantier-free (which semanticallyimplies they are implicitly universally quantied, but we don’t want to countsuch implicit quantiers when we talk about quantier alternations). Ratherthan list out all the axioms, we ran a PCP protocol with the followingoutputs:

Axiom 17 is |x| = |y| ⇒ x# z = y# zAxiom 2 is x = Sx

Axiom 32 is x = y/ 2 ⇒ (2 ·x = y∨S (2 ·x) = y).Throughout, we’ll make use of some standard abbreviations for quanti-

ers, namely, the bounded quantiers:

(∀x ≤ t)A(x) = (∀x)(x ≤ t ⇒ A(x))(∃x ≤ t)A(x) = (∃x)(x ≤ t∧A(x)) “bounded quantiers”and the sharply bounded quantiers:

(∀x ≤ |t|)A(x) = (∀x)(x ≤ |t| ⇒ A(x))(∃x ≤ |t|)A(x) = (∃x)(x ≤ |t|∧A(x))“sharply bounded quantiers”

Bounded arithmetics also have induction axioms, which are of the form

IND: [A(0)∧(∀x)(A(x) ⇒ A(Sx ))] ⇒ (∀x)A(x)In fact, this an axiom scheme , as there is one axiom for each formula A.Which types of formulae we allow for A determines the strength of theinduction axiom scheme we use.

The class Σ bi consists of those formulae that start with an existentialquantier, and have i quantier alternations, where we ignore sharply boundedquantiers. Π bi is dened similarly, except that the formulae in Π bi beginwith a universal quantier. Σ bk −IND is the induction axiom scheme for allformula A ∈ Σ bk .The bounded arithmetic T k2 consists of the basic axioms plus Σ bk -IND.This type of induction is not polynomial however. In order to prove A(n),we have to prove A(0) , A(1), . . . , A(n −1) rst, which gives proofs of lengthO(n), but that is exponential in the bit-length of n. To remedy this, weintroduce a polynomial induction scheme, which allows induction based on

the bit representation, as follows:PIND: [A(0)∧(A( x/ 2 ) ⇒ A(x))] ⇒ (∀x)A(x)

The bounded arithmetic we will be interested in is S k2 , which consists of thebasic axioms plus Σ bk -PIND.

9


10/60

3.2 The Main Theorem of Bounded Arithmetic

The main theorem of bounded arithmetic essentially says that S 12 capturespolynomial-time computations:

Theorem 3.

(a) If f (x1, . . . , x k ) can be computed in polynomial time, then there existsa Σ b1 formula A(x1, . . . , x k , y) such that N |= A(n1, . . . , n k , f (n1, . . . , n k ))for all n1, . . . , n k ∈ N and S 12 ∀x1, . . . , x k∃!yA(x1, . . . , x k , y). Thatis, S 12 proves that A is total and single-valued.

(b) If A(x1, . . . , x k , y) is a Σb1 formula and S 12 ∀x1, . . . , x k∃yA(x1, . . . , x k , y),then there is a polynomial-time computable function f such that |=

A(n1, . . . , n k , f (n1, . . . , n k )), and furthermore, S 12 A(n1, . . . , n k , f (n1, . . . , n k)).Theorems of this sort are called “witnessing theorems.” Note that in

(b), we only require that S 12 prove for each x1, . . . , x k , that there is some ysatisfying A, not necessarily a unique one.

Recall Fermat’s Little Theorem, which we can write as the Σ b1 formula:xn−1 ≡ 1 (mod n) ⇒ (∃y ≤ n)(y|n). As a corollary to the above witnessingtheorem, we have:Corollary 4. If integer factoring is hard (not in polynomial time), then S 12does not prove Fermat’s Little Theorem.

Outline of proof of main theorem. To prove (a), use the denition of P basedon limited iteration (due to Cobham, 1965). Consider the following class of functions f : Nr → N:

• The constant function 0• The successor function x → Sx• Bit-shift x → x/ 2• Multiplication by 2, x → 2x

• Conditional Choice( x,y,z ) = y if x > 0 and z if x = 0

• The characteristic function of ≤, χ ≤(x, y)

10


11/60

Cobham showed that the closure of these functions under composition

and limited iteration (or recursion) is exactly equal to the class of polynomial-time functions. Before dening limited iteration, we dene (unlimited) it-eration: given g: Nr → N and h : Nr +2 → N, we can dene τ : Nr +1 → Nby

τ (x, 0) = g(x) and τ (x, n + 1) = h(x,n,τ (x, n )) .

Using limited iteration, such τ can be constructed iff there are polynomials p and q such that (∀n ≤ p(|x|))( |τ (x, n )| ≤ q (|x|). This forbids construc-tions like τ (0) = 2 and τ (n + 1) = τ (n)2, whose lengths blow up super-polynomially.

Using Cobham’s recursive denition, we can prove (a) rst for the func-tions listed above, and then show that if (a) holds for some functions, thenit also holds for their closure under composition and limited iteration.

(Historical aside: Cobham’s denition gave rise to so-called “machine-independent complexity theory” and really got logicians interested in com-plexity.)

We will nish the proof of (a) and outline the proof of (b) next lecture.

Lecture 4Scribe: Joshua A. Grochow, University of Chicago.


This lecture will be devoted to nishing the proof of the main witnessingtheorem of bounded arithmetic (that S 12 exactly captures polynomial-timecomputation). Before continuing, we need a few technical constructions.

3.2.1 Bootstrapping by Introducing New Symbols

Adding new symbols to our language by bootstrapping is essentially onlya matter of convenience, but it makes the proofs much more intelligible.If a theory T proves (∀x)(∃!y)A(x, y), then we can add a new functionsymbol f A(x) and axioms (∀x)A(x, f A (x)) without altering what can beproved in the theory. That is, if a formula B does not contain f A andT + (∀x)A(x, f A(x)) B , then T B .

When we add a new symbol, we have to be careful how we use induction.In particular, Σ b1-PIND only allows polynomial induction for formulae in theoriginal language, and not formulae with the new symbols. The followinglemma takes care of this technicality, as well as most other technicalities wemight worry about when introducing new symbols in this way:

11


12/60

Lemma 3.1. Let A ∈ Σb1 be such that S 12 (∀x)(∃!y)A(x, y ), and let T =S

12 + (∀x)A(x, f

A(x)) . Then for every B ∈ Σb1(f A), there is a B∗ ∈ Σ

b1 such that T (B ≡ B∗).

Note: we can actually drop the uniqueness from the above statement,but the lemma as stated is all we’ll need.

Hint of proof. We’ll see later that S 12 (∀x)(∃!y)A(x, y ) actually lets us puta bound on y, that is S 12 (∀x)(∃!y ≤ t)A(x, y).The idea is to remove each occurrence of f in B one at a time. SupposeB looks like blah1f (t1, . . . , t r )blah 2. Then an equivalent formula is

∃z ≤ t(t1, . . . , t r )[A(t1, . . . , t r , z)∧(blah 1 z blah2)]We’ve reduced the number of occurences of f by one, and we haven’t in-creased the complexity since we only added an existential quantier. (Notethat, since Σ b1 formulae start with an existential quantier, adding anotherone does not change the complexity of the formula.) Wash, rinse, repeat.

For example, suppose we wish to add the “monus” symbol:

x .− y =x −y If x ≥ y0 otherwise

Then A(x,y,z ) ≡ (y + z = x) ∨ (x ≤ y ∧ z = 0) is a Σ b0 ⊆ Σb1 formula.Existence and uniqueness can be expressed as [ A(x,y,z )∧A(x,y,z )] ⇒ z =z and (∃z)A(x,y,z ), respectively. Both of these follow from the basic axiomsof bounded arithmetic. However, it is hard to show bounded uniqueness inthe form (∃z ≤ x)A(x,y,z ); it is easy to prove in T 12 , but proving this inS 12 seems nearly as hard as proving the whole witnessing theorem. This isactual a more general phenomena: the more powerful the theory, the easierit is to bootstrap. When people want to make this harder (to see if it canstill be done), they consider theories such as I ∆ 0, which is S 2 = k S

k2 but

without #.

3.2.2 Finite Sequences

Now we discuss the encoding of nite sequences in bounded arithmetic, e.g.x1, . . . , x n where n ≤ |t| for some term t. For this we introduce (aftera fair amount of bootstrapping in the style of the previous section) three

functions to our language:

12


13/60


14/60

Finally, we come to the (in)famous cut rule :

Γ −→ ∆ , A A, Π −→ ΛΓ, Π −→ ∆ , Λ

Why is this rule different from all other rules? What sets this rule apart isthe fact that the formula A can be arbitrarily complex. In particular, thepremise of the inference rule can be more complicated than the consequence,which is unlike any of the other inference rules. This makes Hentzen’s The-orem all the more surprising:

Theorem 5 (Hentzen) . The cut rule can be eliminated from rst orderlogic.

This proof of this theorem uses the technique of cut elimination , which is(luckily) available in bounded arithmetic. However, if we add PIND naivelyto S 12 , the cut elimination technique does not work. We carefully add theinference rule:

Γ, A( a/ 2 −→ A(a), ∆Γ, A(0) −→ A(t), ∆

PIND

(in which, as usual, a must not appear in ∆ , Γ). Then we get limited cutelimination (in which A must be a sub-formula of one of A(0), A(t) or basicaxioms, which in particular implies A ∈ Σ b1 ∪Πb1).Finally, we can outline the proof of part (b) of the witnessing theorem:Outline of proof of witnessing theorem part (b). S 12 ∀x∃yA(x, y ), which we’vealready noted implies ( ∀x)(∃y ≤ t)A(x, y).First, if Ai , B j ∈ Σb1, then the sequent A1, . . . , A k −→ B1, . . . , B isequivalent to

(∃y1 ≤ t1)C 1, . . . , (∃yk ≤ tk)C k −→ (∃z1 ≤ t1)D1, . . . , (∃z ≤ t )Dfor some terms t i , t i and formula C i , D j ∈ Σ b0. Using the limited form of cutelimination mentioned above, we can assume that all sequents in our proof are of this form.

We prove (by induction on the inference in the Hentzen Calculus) that,for every sequent as above appearing in our proof there is a polynomial-timefunction f : Nk → (i, b) such that if f (y1, . . . , yk ) = ( i, b) then taking zi = bgives a witness for Bi , whenever y1, . . . , yk are witnesses for A1, . . . , A k , re-spectively. Note that the f constructed here has access to all the parameterst i , t i , C i , D j .

14


15/60

Lectures 5 and 6Scribe: Alex Rosenfeld, University of Chicago.Date: January 20 and 22, 2009

3.3 RSUV isomorphism

With many different ideas of how and in what language proofs should pro-ceed (and how to determine their complexity), it would be nice to have somesort of gateway between them. This gives us the concept of “robustness”,i.e. given different idea of how our proofs should operate, we should arrive atthe same version of feasible provability. To that end, Takeuti and Razborovin 1992 (or is it 1892?) proved a connection between S 21 and a system basedon treating words as sets.

The rst concern in dening such a class is “what do the integers repre-sent?”. Instead of treating integers as a binary expansion (as we always didbefore), we will use them to identify positions in a binary string. Since weare not using binary expansion, we no longer need # , |x|, and x/ 2 . How-ever, we now have the problem that since we encode integers by position, weare no longer able to use binary strings to dene words. Thus, we insteaduse sets of integers to encode words.

Since we are denoting integers by position, it is natural that we getbounded quantiers for free. However, now we have the problem of talk-ing about words. So, we resort to the language of second-order logic andintroduce comprehension axioms as

∃α(

∀

x

≤ t)(A(x)

≡ α (x)) where α is a

second-order variable (interpreted as a word) and A is some predicate notcontaining either α or any quantiers on second-order variables.

Since we get bounded quantiers for free, we need a new way of organiz-ing the complexity of formulas. Now, we count quantiers over words, usingthe notation Σ 1,bk in place of Σ

bk . And we dene V

k1 as the system that in

addition to the Σ 1,b0 -CA (comprehension axiom above) also has Σ1,bk −IN Dand a dozen of simple BASIC axioms describing properties of the remaining

symbols S, 0, + , . , ≤.It turns out that this system is more or less equivalent to the system weused before as shown by the following theorem:

Theorem 6. There are isomorphisms : V k1 → S k2 and : S k2 → V k1 suchthat for any second order formula A, Σ1,bk −CA A ≡ A and S k2 A ≡ Afor any rst-order A. We also get that if A is Σ1,bk then A is Σ

bk .

15


16/60

Let’s go further. Forget about even using integers at all; let’s only con-

sider binary strings. Ferreira in 1990 made a (rs-order) version of boundedarithmetics with only strings, 0, 1, concatenation, and a prex operator. Itturns out that again, we get an isomorphism with the Buss system.

So, it turns out that no matter which path we take, we always end upin Rome.

3.4 The bounded arithmetic hierarchy

Given the previous section, we can work within Buss’s bounded arithmeticand do ne. However, can we get a way to compare the strengths of thesesystems? The following theorem develops a hierarchy of these systems.

Theorem 7. S 12 ⊆ T 12 ⊆ S 22 ⊆ T 22 ⊆ . . .Proof: Well, S k2 ⊆ T k2 mostly follows from denition. T k2 ⊆ S k+12 ,however, is much less trivial to prove.So, let’s take a predicate A(x) in Σ bk . We need to prove that S

k+12 A(0)∧∀x(A(x) ⇒ A(x + 1)) ⇒ A(t).So, let’s look at a more general form of this statement: ∀u ≤ v ≤t((( v − u) ≤ y ∧ A(u)) ⇒ A(v)). In other words, for a given distance y,for any u and v within that distance, A(u) implies A(v). And so, if we can

prove this statement when y = t, we have our original statement.So, how do we prove this statement, while only using polynomial induc-

tion? Well, we take a u and v with distance less than t. We nd a midpointof those numbers, w and apply our polynomial induction hypothesis on thetwo smaller chunks [ u, w] and [w, v]. And in this way we go from y/ 2 to y,so it really takes only polynomially many steps to go from y = 1 to y = t .Thus, we get induction in S k+12 and so T k2 ⊆ S k+12 .

As a technical note, what we actually proved induction for Π bk state-ments, but that’s okay since S k+12 Πbk+1 −PIND (see Homework # 1).3.5 Does this hierarchy collapse?

The natural question given this hierarchy is whether or not it collapses.However, this problem turns out to be at least as difficult to solve as whetheror not the polynomial time hierarchy collapses. Thus, we seem to know little.But the following remarkable result due to Krajicek, Pudlak and Takeuti(1991) proves that it is at most that difficult:

16


17/60

Theorem 8. If P H does not collapse then

S 12=

⊆ T 12

=

⊆ S 22

=

⊆ T 22 . . .

Well, computer scientists have encountered a little bit of difficulty inproving P H doesn’t collapse. However, we do know that there is an oracleC such that P H C does not collapse, and we can perform a similar proof inour systems of bounded arithmetic.

Remember our second order formulas we used earlier? Well, an oraclefor a proof system is a new predicate symbol α we can use in proofs (notethat unlike in Section 3.3, we now do not allow quantiers on α). Andin the same paper Krajicek, Pudlak and Takeuti showed that without any

assumptions we have:Theorem 9.

S 12 (α )=

⊆ T 12 (α )

=

⊆ S 22 (α )

=

⊆ T 22 (α) . . .

An open question is whether we can separate the levels of this hierarchyby formulas of bounded complexity , i.e. show that S k2 (α) = T k2 (α ) via, say,a Σ b1-formula.

Let us give a simple combinatorial proof of the rst separation in thistheorem. First we need to extend Buss’s theorem to polynomial time ma-chines with oracles:

Theorem 10. If S 12 (α ) ∀x∃yA(α,x,y ) then ∃M ∈ P such that for anyoracle L A(L,x,M L (x)).Using this we can prove (the difference between a new predicate symbol

α and a new function symbol f is cosmetic and is in Homework #1):

Theorem 11. Let f be a new (undened, i.e. with no added axioms)function symbol. S 12 (f ) = T 12 (f ).

Proof: Let PHP( f ) ≡ (∀x < 2a)(f (x) < a ) ⇒ (∃x1, x2 < 2a)(x1 =x2 ∧

f (x) = f (y)). This is an encoding of the classic pigeon-hole principle.Paris, Wilkie, and Woods proved in 1988 that T 22 (f ) PHP( f ) andMaciel, Pitasci, and Wodds in 1999 extended this into T 12 (f ) PHP( f ).However, S 21 (f ) PHP( f ) as we will show by a standard adversary argu-

ment.

17


18/60

Let M f be a machine that asks for f at x then tries to nd another point

that collides with x. Since this machine works in (log a)O(1)

, we may makesure it never gets to the point where we must have a collision by answeringits queries appropriately. Thus, S 21 (f ) PHP( f ).

3.6 More witnessing theorems

PLS-problems were devised by Johnson, Papadimitrious, and Yannakakis in1988. Given a nite length binary string x and an optimization programP , we dene three functions, F P , cP , and N P , which simulate the processof searching for its optimum. The rst function, F P (x), is a function thatreturns feasible solutions to problem P with input x. We require threeproperties for such a function. First, we require that if s is a solution,then |s| ≤ p(|x|) for some polynomial p, i.e. our solutions are not morecomplicated than our input. Second, we require that ∅ be in F P (x) fortechnical reasons, which will be made clear later in the proof. Third, werequire that “ s ∈ F P (x)” is poly-time computable. So, intuitively, ourfunction F P should be able to recognize useful solutions in a short amountof time. The second function, cP (x, s ), is a cost function. It quanties howgood a solution s is for input x. Our “ideal” goal is to nd the best solutionto x, i.e. a global maximum to cP . The third function, N P (x, s ), is in somesense a “next solution” function. It takes an input and a solution and ittries to nd a better solution to our input. Whenever it fails, it just stays

at the current s. And so, it follows the rule that ∀s ∈ F P (x)(N P (x, s ) =s∨cP (x, N P (x, s )) > cP (x, s )) and that N P is poly-time computable.With this denition, Buss and Krajicek in 1991 proved the two followingtheorems:

Theorem 12. Let P be a program with associated PLS problem ( F P , cP , N P ).If T 12 ∀s ∈ F P (x)(N P (x, s ) = s ∨ cP (x, N P (x, s )) > cP (x, s )), thenT 12 ∀x∃s ∈ F P (x)(N P (x, s ) = s).

Note that this only states the existence of a local optimum and not a

global optimum. However, in T 12 we can actually nd even a global optimum.Let A(a) be ∃s ∈ F P (x)(cP (x, s ) ≥ a). Since we dened F P to contain ∅,we get that A(0) is true. Remember that each solution has bitlength poly-nomial in the bitlength of our input, so we can’t have an innite amount of solutions. Thus, it is not true that ∀xA(x) and so, by induction, there is

18


19/60

an a such that A(a) is true, but not A(a+1). This a is our global optimum.

So, given a PLS problem, we can dene a Σ b1 formula. However, we canalso get the reverse.

For ease of notation, let’s dene the predicate optP (x, s ) ≡ N P (x, s ) = s,which says that s is a locally optimal solution for P with x. We can provethat:

Theorem 13. Let A(x, y) be a Σ b1 formula. If T 12 ∀x∃yA(x, y) thenthere exists a PLS problem such that T 12 ∀s ∈ F P (x)(N P (x, s ) = s ∨cP (x, N P (x, s )) > cP (x, s )) and T 12 ∀x(∀s ∈ F P (x)∧optP (x, s ) ⇒ A(x, s )).(This trivially works for S 12 ).Proof: Like in the proof of Buss’s witnessing theorem, we rst remove

all essential cuts and then use induction on the inference. We consider onlythe most interesting case of the induction rule without side formulas:

A(a) −→ A(a + 1)A(0) −→ A(t)

.

Well, A(x) is Σb1 so there is a Σ b0 formula B(x, y) such that A(x) ≡ ∃y ≤sB (x, y).For notation’s sake, let sa be a witness for a, i.e. B(a, s a ). By the

inductive hypothesis, there exists a program P that takes < a , s a > asinputs and witnesses the assumption (that is, every local optimum sa+1satises B(a + 1 , sa+1 )).

We can use this P to construct a program Q that witnesses the con-clusion. Let F Q = {< a, s a , s a+1 > : 0 ≤ a ≤ x ∧ B (a, s a ) ∧ sa+1 ∈ F P ( )}, cQ (< a,s a , s a+1 > ) = cP (< a,s a >, s a+1 ) + f (a) where f is some“offset” function that makes the witnesses of a cost less than the witnessesof a + 1. For N Q (< a, s a , sa +1 > ), we applying N P if it can further improvethe solution sa+1 or output < a + 1 , s a+1 , 0 > otherwise. (In the latter casesa+1 will be a witness for a + 1). This program satises the theorem.

There are many witnessing theorems out there, however, this one is

particularly interesting, because it comes from optimization instead of proof complexity.

Lectures 7 and 8Date: January 27 and 29, 2009

19


20/60

3.7 Transition to propositional logic

We discussed two translations from rst-order logic to propositional logic,they can be found in [19, Section 6] (all references are given w.r.t the coursepage http://people.cs.uchicago.edu/ ~razborov/teaching/winter09.html ). Along this way we reminded about the most important concreteproof systems: Frege ( F ), Extended Frege ( EF ) and bounded-depth Frege(F d).

4 Interpolation and Automatizability

We rst gave the general denition of a propositional proof system in thesense of Cook and Reckhow (74) and pointed out the connection to the N P vs. co−NP question.4.1 Feasible Interpolation

We discussed at length the concept of a disjoint N P -pair [20] and the closelyrelated concept of Feasible Interpolation. We proved Feasible InterpolationTheorem for Resolution (in Lecture 14 we will prove a result from [28,29]that is stronger but nonetheless much more intuitive). We ended up withproving that Feasible Interpolation is not possible for Extended Frege unlessfactoring is easy [21].

Lectures 9, 10 and 11Scribe: Eric Purdy, University of Chicago.Date: Fbruary 5, 10 and 12, 2009

4.2 Automatizability

In AI, people try to determine whether a CNF is satisable using theDavis-Longeman-Loveland (DLL) procedure. (Certain modications arealso known as Davis-Putnam, or Davis-Putnam-Longeman-Loveland.)

The performance of Algorithm 1 depends heavily on the choice of thepivot variable - some choices lead to more simplication than others. Thusresearchers in this eld have tried all sorts of heuristics for choosing pivotvariables. As an impossibility result, we would like to give a lower bound onthe runtime of Algorithm 1, regardless of the heuristic used . This is capturedin:

20


21/60

Algorithm 1 Davis-Longeman-Loveland procedure

Input: CNF τ if τ contains an empty clause thenreturn false

end if if τ contains a singleton clause x ii then

Set xi = iend if Pick a pivot variable xLet τ = τ |x= for = 0 , 1.Recur on τ 0, τ 1if either τ 0 or τ 1 is satisable then

return trueelsereturn false

end if

Theorem 14 (DLL is no better than treelike resolution) . The number of CNF’s produced by Algorithm 1 is at least S tree −R (τ ), the minimal size of atreelike resolution refutation. Refuting an unsatisable CNF with Algorithm1 thus takes at least S tree −R (τ ) steps.

Recall the denition of treelike resolution: you can only use a clause C

in an application of the resolution rules once. If you want to reuse it, youhave to rederive it. The graph of a treelike resolution refutation (where eachvertex is a clause, and its children are the antecedents used to prove it) thuslooks like a tree, where the root is the empty clause and the leaves are theaxioms. The graph of a standard resolution refutation will be a DAG.

Proof. We can think of the DLL procedure as generating a tree T DLL , whereeach node is a CNF φ whose children are the two restrictions of φ that weconsider next. We recursively map T DLL to a treelike resolution refutationwhich has as many clauses as T DLL has restrictions of τ . This producesa treelike resolution refutation; thus S tree −R (τ ) is at most the number of nodes in T DLL , which is the number of clauses produced by Algorithm 1.

If in T DLL we have a restricted CNF τ = τ |x1 = 1 ,...,x r = r , we map thisto the clause C = x1− 11 ∨ · · ·∨x1− rr , which represents the assertion “thisassignment can not be extended to a satisfying assignment for τ ”. Inductingup from the leaves of T DLL , when we want to produce the clause x1− 11 ∨· · ·∨x1− rr via resolution, we use the fact that we have already produced two

21


22/60

children of τ corresponding to different settings of some variable x j . This

means that we have produced, via resolution, the two clausesx0 j ∨x1− 11 ∨ · · ·∨x1− rrx1 j ∨x1− 11 ∨ · · ·∨x1− rr .

Applying the resolution rule to these two clauses (with x j being the pivotvariable), we derive the desired clause. (We are using the special case of resolution C ∨x, C ∨ x̄ =⇒ C .)Our base case is the leaves of T DLL , which correspond to restrictions of τ which violate some clause C of τ . These restrictions are mapped to a clauseC which contains all the variables of C (and possibly others). C can thusbe derived in one step from C (which is part of our input) by the weakeningrule.

It is easy to see that if in Algorithm 1 we do not insist on the par-ticular choice of the pivot variable when τ contains a singleton clause (or,perhaps, even if we do), then the argument becomes reversible. That is,S tree −R (τ ) will be precisely the miminal number of clauses generated by anyimplementation of Algorithm 1.

From lecture 2, we know that some CNF’s τ have S R (τ ) which is ex-ponential in the number of variables, so Algorithm 1 will not refute themefficiently using any heuristic. We would still like to nd a heuristic thatcauses Algorithm 1 to be efficient when τ does have a small refutation. More

generally, we would like to efficiently nd a proof that is not much longerthan the shortest proof. This motivates the following denition:

Denition 4.1 (Bonet, Pitassi, Raz 2000) . A proof system f is automati-zable if there exists an algorithm that outputs an f -proof of any tautologyτ , and runs in time S f (τ )O(1) , i.e., polynomially in the size of the smallestf -proof. (We will let p(S f (τ )) be this polynomial for future reference.)

A proof system is quasi-automatizable if the same holds, except with aquasi-polynomial time bound.

This property is somewhat weird in that it is not monotone : making aproof system more powerful can make it less automatizable, if we gain the

ability to make proofs signicantly shorter without the ability to nd theseshort proofs.

Denition 4.2. A proof system f is normal if

S f (τ |ρ) ≤ S f (τ )O(1) ,22


23/60

i.e., applying a restriction to a CNF does not make it much harder to prove

in f .A proof system is strictly normal if

S f (τ |ρ) ≤ S f (τ ).i.e., applying a restriction to a CNF does not make it any harder to provein f .

Well-behaved systems will be strictly normal. Resolution is strictly nor-mal, since we can apply ρ to every clause in the proof without invalidatingany of the deductive steps, and possibly making some of them unnecessary.

Recall:Denition 4.3. A proof system f has feasible interpolation if, given an

unsatisable CNF τ of the form A(y)∧B (z), and an f -refutation of τ of sizes, we can decide in time sO(1) which of A and B is unsatisable. (If neitheris satisable, we can answer with either one of them.)Theorem 15. For any normal proof system f , Automatizability impliesFeasible Interpolation.

This theorem is unfortunate, because we have just proved that, unlessfactoring is easy, we do not have Feasible Interpolation for Extended Frege,Frege, and other nice systems. Thus we do not have Automatizability forthem, either.

Proof. We will only go through the proof for strictly normal proof systems.

It is completely straightforward to adapt it to normal proof systems.Suppose f is automatizable. We then have an algorithm Auto which,given an unsatisable CNF τ , outputs a refutation of τ in time p(S f (τ )). If τ is satisable, Auto outputs ERROR (without any time restrictions).

We can use Auto to perform feasible interpolation using Algorithm 2.

Claim 4.4. The correctness of Auto ensures that we answer correctly inline 5.

Claim 4.5. If Auto does not nish in p(s) steps (line 3) or outputs some-thing weird (line 7), then B is unsatisable.

Suppose B(z) were satisable, and let z = b be a satisfying assignment.

Then, by strict normality, we haveS f (A(y)∧B (z)|z= b) ≤ s = S f (A(y)∧B (z))

S f (A(y)∧true ) ≤ sS f (A(y)) ≤ s

23


24/60

Algorithm 2 Feasible Interpolation Algorithm

Input: A(y)∧B (z), s = S f (A(y)∧B (z))1: Run Auto on A(y) for p(s) steps.

2: if Auto did not nish in p(s) steps then3: print “B is unsatisable”4: else if Auto returns a valid refutation then5: print “A is unsatisable”6: else7: print “B is unsatisable”8: end if

so A(y) had a refutation of length at most s, and Auto should have beenable to nd some refutation in time at most p(s).

This theorem establishes that many proof systems of interest are notautomatizable, but what about resolution? Both resolution and treelikeresolution admit feasible interpolation, so maybe they are automatizable.We could try to automatize them with the following Algorithm 3 based onrefutation width: (this corresponds to a widely-used algorithm in AI)

Algorithm 3 Bounded SearchInput: τ , a 3-CNF

for w = 0 , . . . , n doInfer all possible clauses C of width

≤ w.

if We have inferred ∅ thenHaltend if

end for

The runtime of this algorithm is nO(wR (τ )) .Recall that for standard resolution, wR (τ ) is closely related to S R (τ ) (see

Theorem 1 on page 4). We now complement that result with an even tighter(and easier to prove, too!) relation between width and tree-like refutationsize.

Theorem 16 (Ben-Sasson, Wigderson) . wR (τ ) ≤ log2 S tree −R (τ ).Proof.

Claim 4.6. For any b, S tree −R (τ ) ≤ 2b =⇒ wR (τ ) ≤ b.

24


25/60

Suppose that τ has a treelike refutation of size less than or equal to 2 b.

The last deductive step in this refutation will be x, x̄ =⇒ false , and viathe restriction-refutation equivalence for treelike resolution, this implies that

we have a resolution refutation of τ 0 = τ |x=0 and τ 1 = τ |x=1 . Because ourresolution is treelike, the size of the refutation of τ is the sum of the sizes of the refutation of τ 0 and the refutation of τ 1. Their sum is at most 2 b, so oneof them (say τ 0) has size at most 2 b−1. By induction, we can assume thatwR (τ 0) ≤ b−1 (induction on b), and wR (τ 1) ≤ b (an inner induction on thesize of τ ). By Lemma 2.3 on page 5, this yields wR (τ ) ≤ b, which is whatwe wanted.

Putting this bound together with Algorithm 3, we nd that we cannd a resolution refutation for τ in time nO(wR (τ )) = nO(log( S tree − R (τ ))) =S tree −R (τ )O(log n ) . This algorithm is quasi-polynomial, so we are tempted tosay that treelike resolution is quasi-automatizable. This is not true, becausethe refutation produced by Algorithm 3 will not be treelike.

Question 4.7. Neither resolution nor treelike resolution is quasi-automatizable,under any mildly reasonable complexity assumption.

A practical message emerging from all this is that Algorithm 3 is as goodas any DLL procedure, up to a factor of nO(wR (τ )) . If you wanted to improvethis to a polynomial factor, you most likely can’t, because:

4.3 Resolution is Probably not Automatizable

Theorem 17 (Alekhnovich Razborov 01) . Resolution is not automatizableunless W[P] is tractable.

This shows that automatizability is (most likely) a stronger propertythan feasible interpolation.

This isn’t a class on parameterized complexity, so we do not even attemptto dene W[P]. Instead, we indicate a concrete complete problem in thiscomplexity class.

Problem 4.8 (Minimum Weight Monotone Assignment (MWMA)) .Instance: C , a monotone circuit on n variables (monotone: using only ∧and ∨ gates, no negation)

Solution: k(C ), the minimum Hamming weight of a satisfying assignmenta.

Problem 4.9 (h-Gap MWMA) .An algorithm A (receiving as its input a monotone circuit C and an integerk) solves the h-Gap MWMA problem if

25


26/60

• k(C ) ≤ k =⇒ A( C, k ) = YES• k(C ) ≥ hk =⇒ A( C, k ) = NO

Lemma 4.10. For any constant h > 1, and any function f : N → N, if we can solve the h-Gap MWMA problem in time f (k) · |C |O(1) , then W[P] is tractable.Assumption 4.11. For any constant h > 1, and any function f : N → N,no algorithm A that solves the h-Gap MWMA problem can work in timef (k) · |C |O(1) .

We introduce a new parameter m, which we will assign a value to later.

Lemma 4.12 (Main Lemma) . There is a reduction τ (·, ·) that maps a mono-tone circuit C and a parameter m to a contradiction (= an unsatisable CNF) τ (C, m ), satisfying

mΩ(min {k(C ),log m}) ≤ sR (τ (C, m )) ≤ |C | ·mO(min {k(C ),log m}) .Moreover, τ (C, m ) can be computed efficiently, in time polynomial in |C |and m.Assuming the Main Lemma for now, we can prove Theorem 17.

Proof of Theorem 17. If we have an automatizing algorithm Auto which

runs in time sR (τ )O(1)

and outputs a refutation of size S Auto (τ ), we canutilize it to solve h-Gap MWMA efficiently, using Algorithm 4. We mustprove that we output YES on YES instances, that we output NO on NO in-stances, and that Algorithm 4 obeys our time bounds.

We have to specify values for several parameters.

, h0 The implied constants in the Main Lemma:

m min {k(C ),log m} ≤ sR (τ (C, m )) ≤ |C | ·mh0 k(C )

h1 Suppose Auto runs in time at most sR (τ (C, m )) h1 .

h We pick h so that h2 > h1 (h0h + 1)

m We pick m = 2h max {k, 1k log |C |} = max 2hk , |C |hk . Note that mlog m ≥

mhk .

26


27/60

Algorithm 4 Using Auto to solve h-Gap MWMA

Input: C, k1: m ← m(k, |C |, h1, ) as specied in the text.2: S max ← (|C | ·mk )h13: τ ← τ (C, m ) as described in the Main Lemma.4: Run Auto on τ (C, m ) for S max steps.5: if Auto did not nish in S max steps then6: return NO7: else if S Auto (τ (C, m )) ≤ S max then8: return YES9: else

10: return NO

11: end if

Claim 4.13. When k(C ) ≤ k, algorithm 4 outputs YES.

S Auto (τ (C, m )) ≤ runtime of Auto ≤ sR (τ )h1≤ (|C | ·mh0 k(C ) )h1 = S max

So Auto nishes in line 4, and S Auto (τ (C, m )) ≤ S max on line 7.Claim 4.14. When k(C ) ≥ hk , algorithm 4 outputs NO.

S Auto (τ (C, m )) ≥ sR (τ (C, m ))≥ m ·min {k(C ),log m}≥ m ·hk= m h

2·kh> m h1 (hh 0 +1) ·kh

= mh1 h0 kmh 1 k

h

≥ mh1 h0 k

· |C

|hk

h 1 kh

= ( |C |mh0 k )h1 = S maxTherefore, even if we get to line 7, we output NO.

Claim 4.15. Algorithm 4 obeys the time bound in Assumption 4.11.

27


28/60

The two operations that take superconstant time are creating τ in line 3,

and running Auto in line 4. Line 3 takes at most ( |C | ·m)O(1)

, according tothe Main Lemma 4.12. Line 4 takes at most S max steps, or we stop runningAuto. Since S max = |C |O(1) · mO (k) , and m ≤ 2hk , this results in a timebound of the form |C |O(1) ·f (k).4.3.1 Constructing the contradiction

The Combinatorial PrincipleConsider the following situation: we have our circuit C on n variables

x1, . . . , x n , and a set A of “admissible” m-vectors, where |A| = m. We willsay that an m×n matrix M is admissible if every column of M is admissible.Note that we allow the same column to be repeated multiple times in M .Each row of M represents a possible assignment to the variables x1, . . . , x n .

Denition 4.16 (P C,A). In every admissible matrix M , there is some isuch that row i satises C .This principle only holds for some combinations of A and C . We willpick A later, but as we go through the proof, we will note what propertiesof A will be required.

Denition 4.17. For a set of d vectors (not necessarily distinct), a one-position is a coordinate i such that all the vectors have a one in it.

d1(A) is the maximum d such that every set of d admissible vectors hassome one-position i.Lemma 4.18 (Our tautology is a tautology) . If k(C ) ≤ d1(A), then P C,Ais true (every admissible matrix has a row which satises C ).Proof. There is some satisfying assignment x1, . . . , x n for C which has ex-actly k = k(C ) ones. Let j 1, . . . , j k(C ) ∈ [n] be the positions where x j is one.Since C is monotone, having ones in these positions is sufficient to ensurethat C outputs one.

Let us look at these “sufficient” columns col j1 , . . . , col jk ( C ) ; since k ≤d1(A), they must share some one-position i. rowi has a one in all of thesufficient columns, and is thus a satisfying assignment.This is the rst desired property of A:

d1(A) ≥ k(C )

28


29/60

We choose this principle because we want P C,A to have complexityroughly |A|

k(C )= m

k(C ). We have a simple approach: searching over allpossible placements of admissible vectors in the columns corresponding to

the smallest satisfying assignment for C . We need to show that this isbasically the only approach, and that no signicantly shorter proof exists.

The CircuitFor our purposes, a circuit is a sequence of statements of the form v ←v ∗ v where: v is a new variable that has not been used previously; ∗is either ∨ or ∧; and v , v are either input variables x j , or have alreadyreceived assignments.We will have a special variable out (which receives its assignment in the

same way) that represents the output of the circuit.

The straightforward encodingWe consider evaluating a copy of C on every row of our matrix M . We

have the following variables:

a ij is the i, j -th entry of M .

a j is the admissible vector in the j -th column of M .

zi,v is the value at node v in the i-th copy of C .

This encoding is too straightforward: Resolution might be able to dosomething more clever than the bruteforce approach (at least, we don’tknow how to prove the contrary). Therefore, we use a scrambled encoding.We will still refer to the straightforward encoding to motivate our proofs.

The scrambled encodingWe use redundant encodings of the straightforward variables. We intro-

duce two functions F : {0, 1}s → A, and f : {0, 1}s → {0, 1} (to be speciedlater), and variables x1 j , . . . , x s j , y1i,v , . . . , y si,v , so thatF (x1 j , . . . , x

s j ) = a j

f (y1i,v , . . . , ysi,v ) = zi,v .

We want to specify constraints on these variables, so we will introducethe following abbreviations:

[True v,i ]: f (y1i,v , . . . , y si,v ) = 1, written as a CNF in y1i,v , . . . , y si,v .

29


30/60

[Column j = a]: F (x1 j , . . . , x s j ) = a, written as a CNF in x1 j , . . . , x s j .

Remark 4.19. At the end of the day, s will be O(log m), and, there-fore, these CNFS (as well as those resulting from the axioms listed below)will have size mO(1) . Also, in the proof of Lemma 4.40 we will actuallyneed a slightly more general version in which the functions used to encode[Column j = a] can be different for different j , and the same is true for[True v,i ]. None of our proofs, however, is affected by this generalization.

Our CNF τ has the following axioms:

[Column j = a] =⇒ [True x j ,i ] (for j = 1 , . . . , m , for every a ∈ A, and forevery i such that ai = 1).We will call these the input axioms , since they ensure that each copyof C is receiving as its input a value which is at least the correct row of M (since C is monotone, we do not care about the opposite direction– see Remark 4.21 below). Note that each of these axioms belongs toa specic row and column of the matrix.

([True v ,i ]∗[True v ,i ]) =⇒ [True v,i ] for i = 1 , . . . , m , and whenever v ←v ∗v is a gate in C , for ∗ = ∧,∨.We will call these the the gate axioms , since they force the nodevariables zi,v to correctly emulate C at every gate. Note that each of these axioms belongs to a specic row of the matrix.

¬[True out ,i ] for i = 1 , . . . , m . We will call these the output axioms , sincethey force each copy of C to output zero. Note that each of theseaxioms belongs to a specic row of the matrix.

We then dene

Denition 4.20 (τ (C, A, F, f )) .

τ (C, A, F, f ) =m

j =1 {i|a i =1 }[Column j = a j ] =⇒ [True x

,j i ] ∧

n

i=1 v←v∗v[True v ,i ]∗[True

v ,i ] =⇒ [Truev,i ] ∧

n

i=1¬[True out ,i ]

30


31/60

Remark 4.21. As we already indicated above, we are allowing assignments

to cheat in one direction: in terms of the straightforward encoding, we areallowing (zi,x j = 1∧a ij = 0) and ( zi,v ∗zi,v = 0∧zi,v = 1). This does notcause problems: we claim that any satisfying assignment for τ (C, A, F, f )would produce a genuine counterexample to P C,A.We can produce this counterexample by setting zi,x j = 0 whenever a ij =0, and then repeatedly changing zi,v to zero when its children would make itzero in C . Since C is acyclic and monotone, this will eventually correct allerrors of the two forms we allow. Since we never change any variable fromzero to one, this cannot cause any copy of C to be satised.

Thus, (the negation of) τ (C, A, F, f ) is equivalent to P C,A. Allowing thischeating gives us some needed exibility later in the proof.Lemma 4.22. We can efficiently construct a CNF encoding of τ (C, A, F, f )as long as we can efficiently compute F, f , and as long as we can efficiently enumerate A. (Here efficiently means in time polynomial in |C | and m.)

Exercise: Prove this.

4.3.2 The Upper Bound on sR

Lemma 4.23. If k(C ) ≤ d1(A), then sR (τ (C, A, F, f )) ≤ O(|C |·2s(k(C )+1) ).Proof. There is some satisfying assignment x1, . . . , x n for C which has ex-actly k(C ) ones. Let J 1 = { j1, . . . , j k(C )}be a set of columns where x j is one.We then use the Canonical Refutation in Algorithm 5 on these columns.

We can nd an appropriate i in line 3 because |J | ≤ d1(A). We cansuccesfully deduce [T rue out ,i ] in line 7 because (∀ j ∈ J 1, x j = 1) =⇒C (x1, . . . , x n ) = 1.

Exercise: Show that the Canonical Refutation can be encoded in a Resolu-tion refutation using O(|C | ·2s(k(C )+1) ) clauses.Remark 4.24. Theorem 17 is also true for tree-like resolution, but theproof is more complicated. The reason is that since C is a circuit ratherthan a formula, the Canonical Refutation need not necessarily be tree-like.

This problem is xed by introducing an even more elaborate encoding (seethe paper for the details).

31


32/60

Algorithm 5 The Canonical Refutation

Input: J 1 = { j1, . . . , j k}1: for all (a j 1 , . . . , a jk ) ∈ Ak do2: Assume: ∧ j∈J 1 [Column j = a j ]3: Pick i so that ( a j )i = 1 for all j ∈ J 1.4: for j ∈ J 1 do5: Derive [True x j ,i ] using the input axioms.6: end for7: Use the gate axioms to derive a set of [T rue v,i ] including [True out ,i ].8: Recall that ¬[True out,i ] is an (output) axiom.9: Conclude: j∈J 1 [Column j = a j ] is false.10: end for

11: Conclude: ¬τ (C, A, F, f )4.3.3 The Lower Bound on sR

Denition 4.25. A function F : {0, 1}s → D is called r-surjective if, forany restriction ρ that assigns at most r variables, F |ρ is surjective.Denition 4.26. Let d0(A) be the maximal d such that, for every d posi-tions i1, . . . , i d , there exists an a ∈ A with a i1 = ·· · = a id = 0.Lemma 4.27 (Lower bound on resolution width) . Let r be the largest rsuch that F and f are both r-surjective. Then

wR (τ (C, A, F, f )) ≥ r2

min{k(C ), d0(A)}We follow the same strategy as in the proof of Theorem 2 on page 7.When we dened the axioms of τ (C, A, F, f ), we noted that each of theaxioms refers to a single row i. Let Axi be the set of axioms that refer to

row i, and let Ax I = i∈I Ax i for a set of rows I .

Denition 4.28. For a clause D , let µ(D ) = min {|I | : AxI |= D}.Claim 4.29. µ(D ) = 1 when D is an axiom, since D comes from some rowi.

Claim 4.30. µ(∅) > d 0(A)A set of axioms entails the empty set if and only if it is unsatisable.

Therefore, we need to show that when |I | ≤ d0(A), we can nd an assignment

32


33/60

that satises Ax I . Since |I | ≤ d0(A), some a ∈ A has ( a)i = 0 for all i ∈ I .We can then satisfy Ax I by setting

M = [ a · · ·a] (1)zi,v = 0 for i ∈ I (2)

In the I rows, all variables are zero, which is consistent with the gate andoutput axioms. The input axioms are satised because a is also zero in theserows.

Claim 4.31 (“Continuity of µ”) . In the deduction

D1 D2D

,

µ(D ) ≤ µ(D1) + µ(D2). This is because I 1|= D1, I 2|= D2 =⇒ I 1 ∪I 2|= D .Denition 4.32. An intermediate clause is a clause D satisfying

12

d0(A) ≤ µ(D) ≤ d0(A).Claim 4.33. R can be rewritten so that it contains an intermediate clauseD.

Consider the set of clauses in R which satisfy condition c: µ(D ) <12 d0(A). The conclusion (∅) does not, so there must be a rst clause Dwhich violates c. D cannot be an axiom, so it must have been derived.

If D was derived using the resolution rule on clauses D1 and D2, thenD1 and D2 appear before D , and thus satisfy c. By the continuity property,µ(D ) ≤ 12 d0(A) + 12 d0(A) = d0(A).If D was derived through weakening, then it was derived from some D 1that satises c. We can then break the weakening into two steps, D1 =⇒D =⇒ D, where D is an intermediate clause.

Claim 4.34. If D is an intermediate clause, it cannot satisfy both

wR (τ (C, A, F, f )) < r2

k(C ) (3)

wR (τ (C,

A, F, f )) <

r

2d0(

A) (4)

Let I be a set realizing the minimum {|I | : AxI |= D}. If we remove anyelement i0 ∈ I , we will get a set I for which ¬(Ax I |= D ). This means thatthere is some assignment xt j = α t j , yti,v = β t j that satises Ax I but violatesD.

33


34/60

Claim 4.35. We can modify this assignment to satisfy Ax I , without mod-

ifying any variable that appears in D . (Thus continuing to violate D .)How would we build such an assignment? We want to modify α, β to

satisfy Ax i0 without messing up any of the other rows. Our goal is to achievethe following assignment to the straightforward variables:

• Set the a j to be zero in all of I in any column j where this is possiblewithout modifying D’s variables. We will say that a column is free iswe can put any a there without disturbing D. We will say that theother columns are xed .

• Leave the zi,v alone for i ∈ I .

• We will set the zi0 ,v as they would be while correctly emulating C onthe input x0 which is one in xed columns and zero in free columns.We will say that a row is free if we can set the zi0 ,v arbitrarily withoutdisturbing D .

All that remains is to choose a free row i0, and to prove that mostcolumns are free. Let wi (D ) be the number of yti,v that appear in D. Letw j (D ) be the number of xt j that appear in D.

We choose i0 to minimize wi (D ), so that we have maximum freedomto change this row’s variables without disturbing D. By the r-surjectivityof f , a row i is free if at most r of the y1i,v , . . . , y si,v appear in D. We have

i wi (D ) ≤ w(D ) < r2 d0(A), and i wi (D ) ≥ i∈I wi (D ) ≥ |I |wi

0 (D ).Since |I | ≥ 12 d0(A), we have wi0 (D ) ≤ r , and thus i0 is a free row.By the r -surjectivity of F , a column is free if D contains at most r of thex1 j , . . . , x s j . We have j w

j (D ) ≤ w(D) < r2 k(C ). Let J 0 = { j | w j (D ) > r }.Then r · |J 0| ≤ j w j (D ), so |J 0| < 12 k(C ). Thus any column outside of J 0is free.Claim 4.36. For any free column j /∈ J 0, we can choose ᾱ to set ( a j )i = 0simultaneously for all i ∈ I , without disturbing D.

Since |I | ≤ d0(A), there is an admissible vector a satisfying this condi-tion. We can then put a in any free column by modifying α .Claim 4.37. We can choose β̄ to set row i to x0 without disturbing D.

Claim 4.38. The input axioms from Ax I are satised by ᾱ, β̄ .

Claim 4.39. The gate and output axioms from Ax i0 are satised by ᾱ, β̄ .

34


35/60

Proof. We have modied the yti0 ,v to make the zi0 ,v correctly emulate C

rejecting input x0.This completes the proof of Claim 4.34 and hence of the width lower

bound, but we are more interested in a lower bound on the size of the proof:

Lemma 4.40 (Lower bound on resolution size) . sR (τ (C, A, F, f )) ≥ eΩ(r 2s ·min {k(C ),d0 (A)}) .

The naive way to try to deduce it from Lemma 4.27 would be to applythe general trade-off between width and size (Theorem 1 on page 4). Un-fortunately, the parameters do not work out nicely (or at least we do notknow how to make them work). Instead, we are using the method of random restrictions (later on we will see its much more elaborated version).

All our variables are split into blocks {X j , Y iv} according to their sub-scripts, s variables in each block. Let us choose independently at random(r/ 2) variables in each block and assign them to 0 , 1 (also independently atrandom). What will happen to our refutation? The rst nice thing is thatevery individual fat clause will be eliminated with high probability.

Lemma 4.41. For a clause C with w(C ) ≥ r4 min{k(C ), d0(A)}, the proba-bility it is not eliminated by a random restriction is at most e−Ω( r

2s ·min {k(C ),d0 (A)}) .

Exercise: Prove this.And then, since we have assigned just ( r/ 2) variables in each block,

F

|ρ, f

|ρ will still be (r/ 2)-surjective (although they can become different in

each block – cf. Remark 4.19!). So, we will get a refutation of a contradictionthat still has the form τ (C, A, F |ρ , f |ρ), and we can apply to it Lemma 4.27to conclude that in fact not all fat clauses have been eliminated. Comparingthis with Lemma 4.41 and applying the union bound completes our proof.

4.3.4 Combinatorial Voodoo

For a matrix A, we dene d1(A) (d0(A)respectively) to be the maximumnumber of columns (rows, resp.) such that there is some row (column, resp.)which contains only ones (zeros, resp.) in the intersection of the row andcolumns. For our proof, we need a 0-1 matrix A such that d1(A), d0(A) ≥Ω(log m). Explicit examples of such matrices are well-known, e.g.:Denition 4.42 (The Paley Matrix) . Let m be a prime, and, for i, j ∈ Fm ,a ij = iff j = i and ( j −i) is a quadratic residue mod m.

35


36/60

F and f need to be r-surjective functions on s bits, where s and r

are Θ(log m). For this, we use linear codes. Let h be a sufficiently largeconstant and L ⊆ {0, 1}h log2 m be an F2-linear subspace of dimension log mthat does not contain non-zero vector of weight < log m. There are manyexplicit examples of such L. Now, x in L an arbitrary basis x1, . . . , x log m ,and let F : {0, 1}h log m → {0, 1}log m be the linear mapping dened byF (y) = ( x1, y , . . . x log m , y ). Then F is (logm)-surjective.4.3.5 The Upper Bound when k(C ) is too large

Choosing s = Θ(log m), we have proved that

mΩ(min {k(C ),log m})

≤ sR (τ (C,

A, F, f ))

≤ O(

|C

| ·mO (k(C )) ).

We still need to prove that sR (τ ) ≤ |C | ·mO(log m ) . Our construction thusfar cannot possibly supply this, since we have been assuming all along thatk(C ) ≤ d1(A). When k(C ) > d 1(A), it is even possible that τ (C, A, F, f ) issatisable.

We need explicit examples of unsatisable CNF’s with sR (τ m ) = mΘ(log m ) .We note that the Tseitin tautologies provide τ t with sR (τ t ) = 2 Θ( t) , so if wetake t = (log m)2, these will suffice.

We dene τ (C, m ) = τ (C, A, F, f ) ∧ τ m , where the τ m is operating oncompletely new variables. Now resolution can choose between disprov-ing τ (C, A, F, f ) and disproving τ m , so we have sR (τ (C, m )) ≤ sR (τ m ) ≤m

O(log m )

.Our lower bound is still valid because of feasible interpolation. If we havea refutation for τ (C, A, F, f )∧τ m of size s, then we must have a refuationof size sO(1) for τ (C, A, F, f ) or τ m . This will only weaken the constant insR (τ (C, m )) ≥ mΩ(min {k(C ),log m}) .This completes the proof of the Main Lemma, and thus the theorem.

Lectures 12 and 13Scribe: Gabriel Bender, University of Chicago.

Date: February 17 and 19, 2009

5 Applications to LearningDenition: k(C ) is the minimum number of ones in a satisfying assignmentof a circuit C .

36


37/60

PAC (“Probable, Approximate, Correct”) is a model in learning theory

in which we want to learn from examples generated randomly by naturewith an unknown distribution D.

• Input: A list of pairs (x1, f (x1)) , . . . , (xn , f (xn )) for an unknown func-tion f generated accordingly to D .• Output: A circuit C that will perform “reasonable well” in tryingto predict the output of f on future inputs according to the same

distribution.

“Exact Learnability” is a simple model which (under certain conditionsthat are always fullled in our situation) implies PAC learnability.

Denition: Given {0, 1}n , a Concept Class is a class of Boolean ex-pressions C which represents Boolean functions f : {0, 1}n → {0, 1}. Forinstance, C could be the set of all DNFs in n variables.

5.1 Decision Trees

One concept class of particular interest is the set of all decision trees underthe following construction:

Denition: A decision tree in n variables is a binary tree in which everynode has a label, as well as two edges labeled 0 and 1, and every leaf islabeled “accept” or “reject.”

Denition: A decision tree is balanced if every leaf is at the same depth.

Denition: The height of a tree, h(T ), is the length of the maximal pathfrom the root to any leaf.

Problem: Given a system of equations F (a1) = 1, . . . , F (am ) = m , wewould like to solve it in the concept class C consisting of all decision trees.That is, we want to nd F̂

∈ C that satises F̂ (a i ) = i for 1

≤ i

≤ m. We

would like to do this as efficiently as possible, and want our solution to be“not much longer” than the size s of the minimal solution F ∈ C.

Denition: By proper learning theory , we mean (as above) that an outputhypothesis F̂ must be in the same concept class as our original function F .

37


38/60

Theorem 18 (Alekhnovich et al.) . DNF formulas of size s can be exactly

learned in time mO(1)

nO(√ n log s)

. Notice that this is subexponential, but notpolynomial-time.

Theorem 19. Decision trees are not PAC-learnable in polynomial timeunless SAT is computable in randomized time which is subexponential.

Proof of Theorem 18: This proof has close parallels to the proof that“Short proofs are narrow” (Theorem 1 on page 4). We replace wR (P ), therefutation width of a proof tree P , with the “fanning” width of a DNF ex-pression, ie. the maximum number of variables that appears in any of itsterms. In the original analysis of proof trees, a clause C was fat if w(C) > dfor some predetermined d. Here, a term is fat if it involves at least d variables.

Induction Base: Assume our system S has a solution without fat terms.Then we claim that there exists an algorithm BoundedSearch(S) that solvesthe problem in allocated time. Since there are no fat clauses, each expressionhas at most d variables. Hence, there are at most O(n)d possible terms. Wesimply take, in the allotted time, all those terms that are consistent with allnegative examples F (a) = 0, and check if they “cover” all positive examplesF (a) = 1.

Inductive step: Suppose we have f fat clauses. In the proof of Theorem1, we took 2n restrictions. By the Pigeon-Hole Principle, one of those restric-

tions was guaranteed to produce at most αf fat clauses, where α = (1 −2dn ).However, when we try to apply that approach here, we run into trouble be-

cause we don’t know which restriction will sufficiently simplify our system.The solution is to run our algorithm recursively on all 2 n restrictions inparallel:

Algorithm: Search( S ): Run the following restrictions in parallel:

Search( S |x1 =0 ), Search( S |x1 =1 ), . . . , Search( S |xn =1 )We wait for at least one search to terminate. Assume W.L.O.G. that it is

the restriction xi = . Let T (n, f ) denote the amount of time needed to nd

a solution with n variables and f fat clauses. Then T (n, f ) satises T (n, 0) ≤nO(√ n log s)mO(1) by the base case. Furthermore, T (n, f ) ≤ (2n)T (n,αf ) +T (n −1, f ). T (n,αf ) represents the time needed to run the search algorithmon our “good” restriction xi = . The time is multiplied by 2 n because we’rechecking all 2n possible restrictions in parallel. Once one of our searches has

38


39/60

terminated, we need to run the algorithm on the complementary restriction

xi = 1 − ; this accounts for the additional term T (n −1, f ).The fact that, despite the extra 2 n term, this recursion still resolves toT (n, s ) ≤ mO(1) nO(√ n log s) is in Homework # 2.

6 Concrete Lower Bounds

Let F d denote the depth- d Frege system (in which every clause has depth atmost d). Then we know that

F 1 ≤ Res (k) ≤ F 2(recall that Res (k) is the natural proof system operating with k-DNFs).

6.1 Random 3-CNFs

Suppose we have a CNF τ with n variables and Cn clauses chosen at ran-dom. If C is big then τ is almost surely unsatisable. We may thereforeask: what is the minimum refutation size in various proof systems?

Extended Frege: We believe that S EF (τ ) ≥ exp(nΩ(1) ). Intuitively, thismeans you can’t do much better for random 3-CNFs than a brute-forcesearch through solutions. But the strongest system for which we can reallyprove this is Res( k):

Theorem 20 (Alekhnovich 05) . S Res(√ log n/ log log n ) ≥ exp(nΩ(1) ).

6.2 The Pigeon-Hole Principle

We introduce the notation: [ m] = {1, 2, . . . , m } for an integer m.Observation: Let our domain D = [m] and our range R = [n]. If m > n

then there is no bijection from [ m] to [n].

We create Pigeon-Hole contradiction as follows:

• For each i ∈ [m], j ∈ [n], we introduce a new variable xij . Intuitively,this corresponds to the statement “Pigeon i is in hole j .”• For each i ∈ [m], we set Q i = n j =1 x i,j . Intuitively, Qi represents thestatement “Pigeon i is in some hole.”

39


40/60

• For j ∈ [n] we let Q j represent the statement “Hole j contains somepigeon:” Q j =

ni=1 x i,j

• For each i1 = i2 ∈ [m] and j ∈ [n], Qi1 ,i 2 ,j = xi1 ,j ∨ x i2 ,j . Thiscorresponds to the statement “no two pigeons share the same hole.”• For each i ∈ [m] and j1, j 2 ∈ [m], we set Qi,j 1 ,j 2 = xi,j 1 ∨x i,j 2 . Thisrepresents the statement “no pigeon is placed in two (or more) holes.”Denitions: Let i, i1, i2 ∈ [m] and j, j 1, j 2 ∈ [n]. We dene the followingCNFs by listing their sets of clauses:

• PHP mn = {Q i}∪{Qi1 ,i 2 ,j }• FPHP mn = PHP mn ∪ {Qi,j 1 ,j 2 }• onto-PHP mn = PHP mn ∪{Q j }• onto-FPHP mn = FPHP mn ∪{Q j }

Theorem 21 (Maciel, Pitassi, Woods 99) . S Res ((log n )O (1) ) (P HP 2nn ) ≤ nO(log n ) .It is an open question whether we can say the same thing about Res(2).

Theorem 22 (Krajicek, Pudlak, Woods, Pitassi, Beame, Impagliazzo 93) .S F d (onto-FPHP

n +1n ) ≥ exp(n d ) with a constant d > 0.

Theorem 23 (Raz, Razborov 01) . S R (onto-FPHP ∞n ) ≥ exp(Ω(n1/ 3))There is a number of interesting problems about the complexity of P HP

that are still open (see the course Web page).

6.2.1 Matching Restrictions

Suppose ρ is a restriction. Let m = n + 1 and l = n . Then MlD ×R is theset of partial matchings assigning pigeons to all but l holes.Formally, we start with a collection of pairs ( iν , j ν ). We say that

ρ(x ij ) =

1, if i = iν and j = j ν for some ν

0, if i = iν for some ν and j = j ν 0, if j = j ν for some ν and i = iν

40


41/60

6.2.2 Matching Decision Trees

A Matching Decision Tree is a tree. At each node, we can either query apigeon i or we can query a hole j . If we query the node of a pigeon i then oneof the variables xi, 1, . . . , x i,n will be 1. Hence, our “pigeon” node will haven successors corresponding precisely to those variables. The intermediateedges will have the labels j = 1 , . . . , j = n respectively.

Similarly, if we query a hole j then we can determine which of m pigeonsx1,j , . . . , x m,j sits in it. Hence, a “hole” node will have m successors withedges labeled i1, . . . , i m , corresponding to the cases xi, 1 = 1 , . . . , x i,m = 1respectively.

All leaves in the decision tree are labeled by 0 or 1. Let Br (T ) denote theset of all leaves, Br 0(T ) be the set of leaves labeled by 0, and Br 1(T ) be theset of leaves labeled by 1. Every leaf naturally corresponds to a matchingrestriction.

Denition: A matching decision tree T represents F if

• (∀ρ ∈ Br 0(T ))( F ρ≡ 0)• (∀ρ ∈ Br 1(T ))( F ρ≡ 1)

6.2.3 Putting it all together

Dene h(T ) to be the maximum length over all paths from the root of T toany node.

As a rst attempt, we might say that all formulas can be represented bylow-height matching decision trees with only accepting branches. However,this fails for the depth-2 formula that is the conjunction of axioms: it iseasily deducible but never accepts.

More specically, let ( i Q i )∧( i1 ,i 2 ,j Qi1 ,i 2 ,j ) be the conjunction of allof the PHP mn axioms. It is a depth-2 formula, and is identically zero in thereal world, since there are more pigeons than holes in our setup.

Let Γ be a set of all subformulas that appear in our proof P .

Denition Consider the set of formulas over the language consisting of the constants {0, 1} and the operators ¬ and ∨. The k-evaluation is a map

41


42/60

F → T F that takes formulas to matching decision trees of height at most k.It is dened as follows:

• T 0 consists of a tree with a single node labeled 0, ie. always reject• T 1 consists of a tree with a single node labeled 1, ie. always accept• T ¬F ≡ T cF , ie. take the tree T F and reverse all the labels.• F = i∈I F i . We wish to express T F in terms of the T F i values. Ideallywe would like to let T F represent i∈I Disj (T F i ), where Disj ( T ) =ρ∈Br 1 (T ) tρ is a matching disjunction; tρ ≡ i∈dom (ρ) x i,ρ (i) .We will discuss how to construct a k-evaluation later. For the time being

let us notice that it suffices for our purposes.Lemma 6.1. Suppose k < n3 . Assume Γ is an arbitrary set of formulas that is closed under subformulas and that has a k-evluation. Then there is norefutation of the Pigeon-Hole Principle in which all lines belong to Γ.

Lectures 14 and 15Scribe: Paolo Codenotti, University of Chicago.

Date: February 24 and 26, 2009

Let us nish the discussion on how to construct k-evaluations. Let Γbe a set of formulas closed under subformulas. Let Γ i

⊆ Γ be the set of

formulas in Γ of depth at most i. So we have

Γ0 ⊆ Γ1 ⊆ Γ2 ⊆ ·· · ⊆ Γd = Γ .By induction, we will construct ki -evaluations for Γ i |ρi , where ρi are restric-tions we will construct as we go along, with ρi+1 an extension of ρi .

It is easy to see that restrictions are a natural and robust concept. Inparticular this means that Γ i |ρi+1 will still be a ki -evaluation, independentlyof our choice of ρi+1 (since ρi+1 extends ρi ). The only case we need to takecare of is when F = i∈I F i , where the F i are already evaluated formulas.In particular, we have F = i∈I t i , where the t i are matching terms with atmost r variables. We can assume without loss of generality that ρi is trivial(since we can assume it is already xed). In order to nish our construction,and show we can get a restriction ρi+1 for which we can get ki+1 -evaluationsfor some small ki+1 , we use the following lemma.

42


43/60

Lemma 24 (Switching Lemma) . Let F be a matching r-disjunction, and

let 10 ≤ ≤ n/r , where is a constant (something like 1 / 10). Let ρ bea random restriction that assigns all but + 1 pigeons. Then F |ρ can berepresented by a height s matching decision tree with probability at least1 −exp (−Ω( 2 r/n )) s .In a slightly different context, this lemma was rst stated and proved byH̊astad in 1986. A simple combinatorial proof was given 10 years later byRazborov.

Once we have Lemma 24, the proof is completed by a simple applicationof the union bound: for every individual F ∈ Γd \ Γd−1, the probability thatF |ρk +1 does not get simplied as necessary is |Γd|−1.

7 Algebraic and Geometric proof systems

Assignments to formulas are points on the boolean cube. There are twoways to translate formulas so that variables become numbers (i.e. elementsof some eld K ). The two schools of thought differ in how clauses arerepresented.

1. In the algebraic approach clauses are represented by equations.

2. In the geometric approach clauses are represented by inequalities.

Let us show how this is done by example. The clause x1 ∨x2 ∨x3 canbe written as (1 −x1)x2(1 −x3) = 0 (algebraic), or x1 + (1 −x2) + x3 ≥ 1,which simplies to x1 + x3 ≥ x2 (geometric).7.1 Algebraic proof systems

We will have the axioms x2i −xi = 0 (which force xi = 0 , 1). This meansthat we considerΛn = K [x1, . . . , x m ]/ (x2i −x i ),

which is a ring of multilinear polynomials over our eld K .

7.1.1 Nullenstellensatz proof systems

A CNF reduces to a system of polynomial equations f 1 = 0 , . . . , f m = 0. Itis easy to see that there is no solution to the above system of equations if and only if there exist polynomials Q i such that

1 = Q1f 1 + · · ·+ Qm f m43


44/60

(note that due to the presence of the axioms x2i −x i = 0, we do not have torequire that the eld K is algebraically closed). Therefore the polynomialsQi are a refutation for the CNF. This is called the Nullstellensatz proof system . The degree of a refutation is dened as

d(1 = Q1f 1 + · · ·+ Qm f m ) = maxi deg (Q i ).We dene dK NS (τ ) to be the minimum degree of any refutation of a polyno-mial translation of τ .

7.1.2 Polynomial Calculus

Polynomial Calculus (PC) is a proof system that is a ”dynamical version” of

Nullstellensatz. The inference rules in Polynomial Calculus are the following:

f = 0 g = 0αf + βg = 0 ∀α, β ∈ K

f = 0f ·x i = 0 ∀

i ∈ [n]We dene the degree of the rst rule to be max {deg(f ), deg(g)}, and thedegree of the second rule to be deg( f ) + 1. A PC refutation is a sequence

of inferences proving 1 = 0. The degree of a refutation of a contradiction τ is dK P C (τ ), the minimum degree of a PC refutation of τ . We can make thefollowing two observations about dK P C (τ ):

• dK P C (τ ) ≤ wR (τ ) (we can simulate resolution rules with polynomialcalculus).

• dK P C (τ ) ≤ dK NS (τ ) + O(1).There are many separation results for Polynomial Calculus and Nullstel-

lensatz. For example Buss and Pitassi showed the following.

Theorem 25 (Buss, Pitassi 1998) . Let τ be the least induction principle.x1 = 1, xi ⇒ xi+1 , xn = 0. τ is obviously unsatisable and has a PCrefutation of degree 2 (or 3). However, dK NS (τ ) is linear in n over any eld.Proposition 7.1. PC refutations of degree O(1) are automatizable (see lec-ture 9 for the denition).

Proof. The proof is essentially the Groebner basis algorithm. The idea is asfollows. Let X d be the linear

Documents

Propositional Proof Complexity - Razborov