26
Satisfiable Satisfiable k k -CNF formulas -CNF formulas above above the threshold the threshold Danny Vilenchik Danny Vilenchik

Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Embed Size (px)

Citation preview

Page 1: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Satisfiable Satisfiable kk-CNF formulas -CNF formulas aboveabove the threshold the threshold

Danny VilenchikDanny Vilenchik

Page 2: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

SAT – Basic Notions

3CNF form:

F = (x1Çx2Çx5) Æ (x3Çx4Çx1) Æ (x1Çx2Çx6) Æ…

Ã

F = ( F ÇF Ç T ) Æ ( T Ç T Ç T ) Æ ( T Ç F Ç T )Æ…

x1x2x3x4x5x6

FFTFFT

Goal: algorithm that produces optimal result, efficient, and works for all inputsGoal: algorithm that produces optimal result, efficient, and works for all inputs

Page 3: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

SAT - More Background …

Finding a satisfying assignment is NP Hard [Cook’71] No approximation for MAX-SAT with factor better than 7/8 [Hastad’01] How to proceed? Hardness results only show that there exist hard instances Many researchers take the heuristical approach

Typical instance? One possibility: random models, average case analysis

Heuristic is a polynomial time algorithm that produces optimal resultson typical instances

Heuristic is a polynomial time algorithm that produces optimal resultson typical instances

Page 4: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Random SAT Distributions Most popular k-SAT distribution is the uniform distribution:

Fix c,k and chose u.a.r. m=cn clauses out of possible clauses

[Fri99] Phase Transition: there exists a number d=d(k,n) such that m/n>d: most k-CNF's are not satisfiable (k=3, d<4.506, [DBM00])

m/n<d : most k-CNF’s are satisfiable (d>3.42, [KKL02])

Simple upper bound on d - 2kln2

Proof idea: pick (2kln2)n random clauses; the expected number of satisfying assignments goes from !(1) to o(1)

Too far off for small k: 23ln2 ≈ 5.545 (in particular, no tight concentration)

Major open question: what is the correct value of d? Conjectured: 2kln2-c, c some universal constant

Proven so far: at least 2kln2-k/2 [AP05]

k

nk2

Page 5: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Random SAT cont.

The below threshold regime was extensively studied (satisfiability) :

Rigorous analysis of heuristics: Pure Literal [BFU93], RWalkSAT [AB03]

Experimental results: Survey Propagation [BMZ05]

The typical structure of the solution space (clustering) [AR06, MMZ05]

Focus on near-threshold formulas: trying to figure out the threshold

Above-threshold regime is interesting mathematically and algorithmically The above-threshold regime is not necessarily “easy”

Why not consider a satisfiable 3CNF instance with 7n clauses?

The uniform distribution with m/n>d is not suitable for average case analysis of satisfiability heuristics (it is for refutation)

Page 6: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Above Threshold SAT Distributions

Average case analysis of above threshold k-CNF formulas is scarce (no more than 5 papers until 2003)

Why is that? How to define a “meaningful” distribution over a negligible fraction of k-CNFs?

Maybe such a distribution will not be approachable using current techniques

Maybe such a distribution will not even be efficiently-sampleable

Our main contribution:

arigorousaverage case study (algorithmically and structural properties)

of the above-threshold satisfiable kSAT regime-

Our main contribution:

a rigorous average case study (algorithmically and structural properties)

of the above-threshold satisfiable k-SAT regime

Page 7: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

(x1Çx2Çx5)Æ(x3Çx4Çx1)Æ(x1Çx2Çx6) …

What was known so far? The only distribution that was studied is the planted distribution

Fix some assignment à to the n variables

Fix c and include m=cn clauses u.a.r. out of clauses which are satisfied by Ã

Planted models also “fashionable” for graph coloring, max clique, max independent set, min bisection …

What was known so far? (for k=3)

k

nk )12(

[KP92] Greedy Algorithm

m/n £(n)

Simple Exercise:Majority Vote

£(logn)Suff. Large Constant

[Fla03]Spectral Algorithm

[AB03]RWalkSAT fails

Experimental:a variant of RWalkSAT

O(1)

x1x2x3x4x5x6

FFTFFT

Page 8: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Our Results

[Krivelevich, Coja-Oghlan, V. 07]We characterize the structure of the solution space of a typical planted formula with m/n>c, c some sufficiently large constant:

1. All sat. assignment are within Hamming distance e-(m/n)n

2. All sat. assignments agree on all but e-(m/n)n variables

Below threshold “complicated” clustering of random k-SATPart of this was proven in [AR06]

When m/n= (logn) then 2. implies only one satisfying assignment

Page 9: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

The Uniform Distribution

Pick m clauses u.a.r. out of all possibilities conditioned on satisfiability

What makes this distribution harder to analyze?

Clauses are no longer independent

Not clear how (if at all possible) to sample it efficiently

Long standing open question: is this distribution tractable for m/n=O(1)?

Was shown to be tractable for m/n= (logn) [BBG03]

k

nk2

[Krivelevich, Coja-Oghlan, V. 07]

We describe a deterministic polynomial time algorithm that finds a satisfying assignment foralmost all satisfiable kCNF- formulas

with m/n>C,C )k( a sufficiently large constant

[Krivelevich, Coja-Oghlan, V. 07]

We describe a deterministic polynomial time algorithm that finds a satisfying assignment for almost all satisfiable k-CNF formulas

with m/n>C, C(k) a sufficiently large constant

Page 10: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

The Uniform Distribution

Improving upon the exponential time algorithm for uniform satisfiable 3CNFs in this regime (only one known so far, [Chen03])

We show that the planted and uniform distributions share many

structural properties (“close”)

In particular, same single-cluster structure of the solution space

Flaxman’s algorithm [Fla03] works for the uniform distribution as well

Justifying the somewhat unnatural usage of planted-solution models

Page 11: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

How to approach the uniform distribution?

A – a “bad” structural property

¹ – expected number of satisfying assignments of uniform k-CNF

In the sparse regime, ¹ is exponential in n This approach works only for extremely rare bad properties

How about bad properties that occur w.p. 1/poly(n)?

(tedious) counting argument …

Lemma [KCOV’07]: Pruniform[A] < ¹¢Prplanted[A] Lemma [KCOV’07]: Pruniform[A] < ¹¢Prplanted[A]

Exclude a fixed graph on 10 vertices with 40

edges

Page 12: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Case study: proving the existence of a single cluster

Consider the planted 3SAT distribution (m clauses included u.a.r.)

m/n sufficiently large constant

Every variable x is expected to support 3m/(7n) clauses w.r.t. the planted

Pr[x supports C]=Pr[x supports C|x appears in C]Pr[x appears in C]=

1/7 ¢ 3/n = 3/(7n)

) E[support of x]=3m/(7n)

( x Ç y Ç z ) = (T Ç F Ç F)

Page 13: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Typical planted 3CNF instances

Fact 1: whp there is no set H of h variables s.t. h<n/1000 and thereare at least hm/(10n) clauses containing two variables from H

Fact 1: whp there is no set H of h variables s.t. h<n/1000 and thereare at least hm/(10n) clauses containing two variables from H

V

H

( x4 Ç x7 Ç x16 )

( x43 Ç x10 Ç x41 )

( x1 Ç x4 Ç x6 )

( x22 Ç x7 Ç x54 )

( x21 Ç x4 Ç x88 )

Pr[orange clause]· 3(h/n)2

E[ # orange clauses] = 3m¢ (h/n)2 = (hm/n) ¢ (3h/n) · hm/(300n)

Page 14: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Typical planted 3CNF instances

Fact 2: whp there are no two satisfying assignments at distance > n/1000Fact 2: whp there are no two satisfying assignments at distance > n/1000

T T T T T T T T T T … T (the planted)

à F F F F F F F F T T…T T

n/1000

( x4 Ç x7Ç x16 )

1. Unsatisfied under à but satisfied by – can potentially be included, but is not

2. There are (n3) such clauses – very small probability for none to be included

Page 15: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Clustering cont.

Claim: suppose that F is typical and every variable has the expected support then F is uniquely satisfiable

Claim: suppose that F is typical and every variable has the expected support then F is uniquely satisfiable

Proof: suppose not,

Let be the planted assignment and à some other satisfying assignment

Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t.

Consdier such clause (T Ç F Ç F)

Define H={ x : Ã(x)(x) }, h=|H|<n/1000 (Fact 1)

There exists 3hm/(7n) clauses containing two variables from H

This contradicts Fact 2

Proof: suppose not,

Let be the planted assignment and à some other satisfying assignment

Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t.

Consdier such clause (T Ç F Ç F)

Define H={ x : Ã(x)(x) }, h=|H|<n/1000 (Fact 1)

There exists 3hm/(7n) clauses containing two variables from H

This contradicts Fact 2

F TÃ:

Page 16: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Clustering cont.

This picture is whp the case when m/n>Clog n When m/n=O(1) - whp not the case (some variables have 0 support)

Definition: Given a 3CNF F and a satisfying assignment Ã, a set CµV is called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]

Definition: Given a 3CNF F and a satisfying assignment Ã, a set CµV is called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]

Claim: For F in the planted distribution, m/n sufficiently large constantthere exists a core C s.t. w.r.t. the planted assignment s.t. |V(C)|>(1-e-m/n)n C is frozen in F

Claim: For F in the planted distribution, m/n sufficiently large constantthere exists a core C s.t. w.r.t. the planted assignment s.t. |V(C)|>(1-e-m/n)n C is frozen in F

Corollary: one-cluster structureCorollary: one-cluster structure

( x Ç y Ç z )

x z

y

w

( x Ç y Ç w )

Page 17: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Moving to the Uniform Case

A – a “bad” structural property (in our case: “no big core”) –expected number of satisfying assignments of uniform 3CNF

Lemma: Pruniform[A] < ¢Prplanted[A] Lemma: Pruniform[A] < ¢Prplanted[A]

Claim: Pruniform[no big core] < ¢Prplanted[no big core]< ¢e-®nClaim: Pruniform[no big core] < ¢Prplanted[no big core]< ¢e-®n

Claim: <e¯n, ¯<®Claim: <e¯n, ¯<®

Corollary: Pruniform[no big core] = o(1)Corollary: Pruniform[no big core] = o(1)

Page 18: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

The Average-Case Complexity of k-SAT

2kln2

Unit Clause [CF86]

2k/k 1002kln2

[KCOV’07]threshold

The conditioned uniformdistribution

Page 19: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Planted k-SAT – closer to the thresholdJoint work with U. Feige and A. Flaxman

What happens when m/n=O(1), not necessarily a large constant

Let F be a random planted k-CNF with m clauses

Set fx - # sat. assignments at distance xn from the planted

Set px - the probability that assignment à at distance xn satisfies F

[ ]x x

nE f p

xn

px depends only on x

px decreases with x, binomial coefficient maximized at n/2

Page 20: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

k=35, c=1.05¢2kln2

lnfx/n

x

x

lnE[fx]/n

x

Single cluster regime

This is true actually for c=2kln2+k

Page 21: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

k=35, c=(1+0.0000…)2kln2

x

x

lnE[fx]/n

This may imply that there is more than one cluster – to verify can use second moment (similar stuff were done by [AR06, MMZ05])

We show a regime with same plot but there is only one cluster (counting minimal satisfying assignments)

Page 22: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

The Uniform Distribution

Define Dx = # of pairs of sat. assignments at distance xn

Similar phenomenon occurs with Dx (single cluster) near the threhold

Need to estimate events of the form Pr[Ai and Aj satisfy F]

Ai and Aj are assignments at distance xn

It is not even clear how to estimate Pr[Ai]

This is easy in the non-conditioned uniform distribution: (1-2-k)m

Page 23: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

Future Challenges

The algorithmic understanding of sparse instances is lacking

m/n=O(1), not necessarily large enough constant

Experimental results for algorithms that work for planted 3SAT

The geometry of the solution space is simple – adjust current algorithms

Thanks !

Page 24: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

The Uniform Distribution – “Online” Version Joint work with M. Krivelevich, B. Sudakov

Randomly permute all possible clauses

Start with F=;; F=F[Ci if F remains satisfiable

Similar models were studied for graph problems

[ESW92] for random triangle-free graphs

Easy fact: at the end of the process F has only one satisfying assignment and contains clauses

What happens when the process is stopped after m iterations?

We describe a deterministic polynomial time algorithm

that finds whp a satisfying assignment for such k-CNF formulas

with m/n>c, c(k) a sufficiently large constant

We describe a deterministic polynomial time algorithm

that finds whp a satisfying assignment for such k-CNF formulas

with m/n>c, c(k) a sufficiently large constant

k

nk2

k

nk )12(

Page 25: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

SAT and Message Passing

Graphical Models for SAT and Warning Propagation:

x1

.

.

.

xi

.

.

.

xn

C1

.

.

.

Cj

.

.

.

Cm

xi appears in Cj

xi tells Cj its current preferred assignment (-1,0,1)

Cj sends xi a warning if all other

literals indicate that they falsify Cj

Message passing algorithms are widely used in many areas of CS (also practically): AI, Coding Theory, CSP

Warning Propagation is the “primal ancestor“ of the Belief Propagation algorithm [Pearl88] and the Survey Propagation [MMZ05]

Survey Propagation seems powerful in solving “hard” 3SAT instances (where other methods fail) [MMZ05]

Page 26: Satisfiable k -CNF formulas above the threshold Danny Vilenchik

SAT and Message PassingJoint work with U. Feige and E. Mossel

Reinforces the following “folklore” view:

When clustering is complicated ) formulas are hard ) sophisticated algorithms needed: Survey Propagation

When clustering is simple ) formulas are easy ) naïve algorithms work: Warning Propagation

Warning Propagation solves whp (planted/uniform) 3CNF formulas

with m/n>c ,c some sufficiently large constant, inOlog)n(iterations

Warning Propagation solves whp (planted/uniform) 3CNF formulas

with m/n>c, c some sufficiently large constant, in O(logn) iterations