Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University, USA SAT 2010 Conference Edinburgh, July 2010 An Empirical Study of Optimal Noise and Runtime

Lukas Kroc, Ashish Sabharwal, Bart SelmanCornell University, USA

SAT 2010 Conference

Edinburgh, July 2010

An Empirical Study of Optimal Noise andRuntime Distributions in Local Search

Presented by:

Holger H. Hoos

Local Search Methods for SAT

A lot is known about Stochastic Local Search (SLS) methods[e.g. Hoos-Stutzle ’04], especially their behavior on random 3-SAT

Along with systematic search, the main SAT solution paradigm Walksat one of the first widely successful local search solver

Biased random walk Combines greedy moves (downhill) with stochastic moves (possibly

uphill) controlled by a “noise” parameter [0% .. 100%]

Yet, new surprising findings are still being discovered Part of this work motivated by the following observation:

Empirical evidence that Walksat's running time on large, random,3-SAT instances is quite predictable, and scales linearly

with number of variables for a specific setting of the noise parameter[Seitz-Alava-Orponen 2005]

Optimal Noise and Runtime Distributions in Local Search 2

Our Motivation

Our work looks at Walksat again, on large, random, 3-SAT formulas, and seeks answers to two questions:

A. Can we further characterize the “optimal noise” and the linear scaling behavior of Walksat?• Key parameter: the clause-to-variable ratio, α

B. How do runtime distributions of Walksat behave at sub-optimal noise?• Are they concentrated around the mean or do they have “heavy

tails” similar to complete search methods?• Heavy tails very long runs more likely than we might expect• Heavy tails not reported in local search so far

Note: Walksat still faster than current adaptive, dynamic noise solvers on these formulas; studying behavior at optimal static noise of much interest


Summary of Results

Walksat on large, random, 3-SAT formulas:

A. Further characterization the “optimal noise” and linear scaling: A detailed analysis, showing a piece-wise linear fit for optimal noise

as a function of α, with transitions at interesting points(extending the previous observation that ~57% is optimal for α=4.2)

Simple inverse polynomial dependence of runtime on α

B. Runtime distributions of Walksat behave at sub-optimal noise Exponential decay in the high noise regime Heavy tails in the low noise regime

First quantitative observation of heavy tails in local search [earlier insights: Hoos-Stutzle 2000]

Preliminary Markov Chain model


A. Further Study of Optimal Noise and Linear Scaling

Optimal Noise Setting vs. α

Question:

How does the optimal noise setting vary with α and N?

Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K]

For each, find the noise setting where Walksat is the fastest (binary search)

Average these optimal noise settings and plot against α



Optimal Noise Setting vs. α

Data with 1 standard deviation bars

Generalized Unit Clause heuristic works till here

Greedy Walksat (GSAT) works till here

Optimal noise depends significantly on α (e.g., ~46% at α=3.9; ~57% at α=4.2)

Very good piece-wise linear fit

Transitions at interesting places:• α≈3: up to which generalized

unit clause (GUC) rule works almost surely [Frieze-Suen 1996]

• α≈3.9: up to which greedy Walksat (GSAT) works (also where “clustering structure” of the solution space is believed to change drastically: from one giant cluster to exponentially many small ones [Mezard-Mora-Zecchina 2005])


Linear Scaling at Optimal Noise

Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] Measure Walksat's runtime with optimal noise (#flips till solution found) Plot #flips/N against α (one point per run, no averaging)

Results: Inverse polynomial fit of #flips/N as a function of α Suggesting linear scaling for α < 4.235

Points with varying N fall on each other after

rescaling by N, showing linearity wrt N

[fig explained in paper]

B. Runtime Distribution of Local Search Methods


Standard vs. Heavy Tailed Distributions

Standard distributions: Exponential or faster decay

e.g., Normal distribution

Heavy-tailed distributions: Power law decay

e.g. Pareto-Levy distribution

Power Law Decay

Standard Distribution(finite mean & variance)

Exponential Decay


Heavy Tailed Distributions

Heavy-tailed distributions: Power law decay

e.g. Pareto-Levy distribution

Signature: tail of the distribution is a line in log-log plot

Observed in systematic search solvers Mechanism well-understood in terms of “bad” variable assignments that

are hard to recover from [Gomes, Kautz and Selman ‘99, ’00] Motivated key techniques such as search restarts, algorithm portfolios

Not previously observed in studies on local search methods

Power Law Decay


Runtime Distributions of Walksat

Experiment:

Generate a random 3-SAT formula with N=100K at α=4.2 Large formulas, free of small size effects Very hard to solve Still less constrained than formulas at the phase transition

(α4.26)

Run 100K (!) runs of Walksat with noise settings around the optimal

Plot the runtime distribution: probability of failure to find a solutionas a function of #flips


Runtime Distributions of Walksat

[Setting: Large, random, 3-SAT formula with α=4.2]

Summary of Results:

There is a qualitative difference between noise higher that optimal (>56.7%) and lower that optimal (<56.7%)

High noise regime: tail of P[failure] has an exponential distribution Low noise regime : tail of P[failure] has a power-law distribution

Intuition captured by a (preliminary) Markov Chain model High noise means “guessing the solution” Low noise (too greedy) leads the search into “local traps” Optimal noise is where the two effects balance

14

Heavy-Tails in Low Noise Regimes

LOG-LOG scale straight line = power-law decay

Last 5% of tail (5K points)Linear slope = 0.38

100K data points plotted per curve;actual data points, no fitting;Not all data points markedwith o, x, +, etc. for clarity

Heavy-Tails in Low Noise Regimes


Same data as previous plot, but with all 100K data points (per curve) marked

with o, x, +, etc., and full y-axis. As before, actual data points, no fitting.

Qualitative Contrast: High vs. Low Noise Regimes


High Noise

Low Noise

Not straight lines not heavy tailed. In fact, log-linear plot reveals a clear exponential tail

Line heavy tailed.

extremely long runs are much more likely than one might expect!

LOG-LOG scale straight line = power-law decay

Understanding Variation with Noise Leveland Power-Law Decay: Preliminary Insights

Different “Search” at High, Low, Opt Noise


Experiment: Run Walksat at different noise levels on a formula with 100K vars, 420K clauses Plot how the number of unsatisfied clauses evolves as the search progresses

(0 on y-axis = solution)

High noise: search “stuck”at a relatively high value

Optimal noise: a gradualdescent until solution found

Low noise: #unsat clausesdecreases fast but gets “stuck”at a relatively low value

Markov Chain Model CapturingPower-Law Decay (preliminary)


[details omitted; refer to paper. Similar to work of Hoos ’02]

Key features: States represent (roughly) the

number of unsatisfied clauses;left-most state = all solutions

Ladder structures capture fallinginto a “trap”; the farther it keepsfalling, the harder it gets to recover(recovery time = hitting time of a biased 1-dimensional Markov Chain)

Markov Chain Model CapturingPower-Law Decay (preliminary)


[details omitted; refer to paper. Similar to work of Hoos ’02]

In the horizontal part of the chain:

High noise: avoids traps but attraction towards the top-middle node; exponential time to convergence, very concentrated around the mean

Low noise: leftward drift but good chance of falling into a trap; exponential time to convergence but power-law decay

Summary


A. Further study of optimal noise for Walksat depends on the clause-to-variable ratio, α, in piece-wise linear fashion

with transitions at interesting points allows for a simple inverse polynomial fit for the linearity constant

B. Runtime distributions in local search drastic change in behavior below and above optimal noise exponential decay for higher-than-optimal noise power-law decay (heavy tails) for lower-than-optimal noise

Future directions: A better understanding of when heavy tails appear and when they don’t Improved model capturing heavy tails in local search Ways of utilizing these insights to improve local search solvers

(similar to restarts and algorithm portfolios for complete search)

Documents

Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University, USA SAT 2010 Conference Edinburgh, July 2010 An Empirical Study of Optimal Noise and Runtime